diff --git a/README.md b/README.md index af87e0b51..818e7334e 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,42 @@ # Twitter's Recommendation Algorithm -Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the -Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The -diagram below illustrates how major services and jobs interconnect. +Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for serving feeds of Tweets and other content across all Twitter product surfaces (e.g. For You Timeline, Search, Explore). For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). -![](docs/system-diagram.png) +## Architecture -These are the main components of the Recommendation Algorithm included in this repository: +Product surfaces at Twitter are built on a shared set of data, models, and software frameworks. The shared components included in this repository are listed below: + +| Type | Component | Description | +|------------|------------|------------| +| Data | [unified-user-actions](unified_user_actions/README.md) | Real-time stream of user actions on Twitter. | +| | [user-signal-service](user-signal-service/README.md) | Centralized platform to retrieve explicit (e.g. likes, replies) and implicit (e.g. profile visits, tweet clicks) user signals. | +| Model | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. | +| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. | +| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. | +| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. | +| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. | +| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. | +| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). | +| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. | +| | [representation-scorer](representation-scorer/README.md) | Compute scores between pairs of entities (Users, Tweets, etc.) using embedding similarity. | +| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. | +| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. | +| | [timelines-aggregation-framework](timelines/data_processing/ml_util/aggregation_framework/README.md) | Framework for generating aggregate features in batch or real time. | +| | [representation-manager](representation-manager/README.md) | Service to retrieve embeddings (i.e. SimClusers and TwHIN). | +| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. | + +The product surface currently included in this repository is the For You Timeline. + +### For You Timeline + +The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline. + +![](docs/system-diagram.png) + +The core components of the For You Timeline included in this repository are listed below: | Type | Component | Description | |------------|------------|------------| -| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. | -| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. | -| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. | -| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. | -| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. | -| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. | -| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). | | Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. | | | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. | | | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). | @@ -26,11 +46,10 @@ These are the main components of the Recommendation Algorithm included in this r | Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). | | | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. | | | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. | -| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. | -| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. | -| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. | -We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. +## Build and test code + +We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. We plan to add a more complete build and test system in the future. ## Contributing diff --git a/RETREIVAL_SIGNALS.md b/RETREIVAL_SIGNALS.md new file mode 100644 index 000000000..6f064bc46 --- /dev/null +++ b/RETREIVAL_SIGNALS.md @@ -0,0 +1,51 @@ +# Signals for Candidate Sources + +## Overview + +The candidate sourcing stage within the Twitter Recommendation algorithm serves to significantly narrow down the item size from approximately 1 billion to just a few thousand. This process utilizes Twitter user behavior as the primary input for the algorithm. This document comprehensively enumerates all the signals during the candidate sourcing phase. + +| Signals | Description | +| :-------------------- | :-------------------------------------------------------------------- | +| Author Follow | The accounts which user explicit follows. | +| Author Unfollow | The accounts which user recently unfollows. | +| Author Mute | The accounts which user have muted. | +| Author Block | The accounts which user have blocked | +| Tweet Favorite | The tweets which user clicked the like botton. | +| Tweet Unfavorite | The tweets which user clicked the unlike botton. | +| Retweet | The tweets which user retweeted | +| Quote Tweet | The tweets which user retweeted with comments. | +| Tweet Reply | The tweets which user replied. | +| Tweet Share | The tweets which user clicked the share botton. | +| Tweet Bookmark | The tweets which user clicked the bookmark botton. | +| Tweet Click | The tweets which user clicked and viewed the tweet detail page. | +| Tweet Video Watch | The video tweets which user watched certain seconds or percentage. | +| Tweet Don't like | The tweets which user clicked "Not interested in this tweet" botton. | +| Tweet Report | The tweets which user clicked "Report Tweet" botton. | +| Notification Open | The push notification tweets which user opened. | +| Ntab click | The tweets which user click on the Notifications page. | +| User AddressBook | The author accounts identifiers of the user's addressbook. | + +## Usage Details + +Twitter uses these user signals as training labels and/or ML features in the each candidate sourcing algorithms. The following tables shows how they are used in the each components. + +| Signals | USS | SimClusters | TwHin | UTEG | FRS | Light Ranking | +| :-------------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- | +| Author Follow | Features | Features / Labels | Features / Labels | Features | Features / Labels | N/A | +| Author Unfollow | Features | N/A | N/A | N/A | N/A | N/A | +| Author Mute | Features | N/A | N/A | N/A | Features | N/A | +| Author Block | Features | N/A | N/A | N/A | Features | N/A | +| Tweet Favorite | Features | Features | Features / Labels | Features | Features / Labels | Features / Labels | +| Tweet Unfavorite | Features | Features | N/A | N/A | N/A | N/A | +| Retweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels | +| Quote Tweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels | +| Tweet Reply | Features | N/A | Features | Features | Features / Labels | Features | +| Tweet Share | Features | N/A | N/A | N/A | Features | N/A | +| Tweet Bookmark | Features | N/A | N/A | N/A | N/A | N/A | +| Tweet Click | Features | N/A | N/A | N/A | Features | Labels | +| Tweet Video Watch | Features | Features | N/A | N/A | N/A | Labels | +| Tweet Don't like | Features | N/A | N/A | N/A | N/A | N/A | +| Tweet Report | Features | N/A | N/A | N/A | N/A | N/A | +| Notification Open | Features | Features | Features | N/A | Features | N/A | +| Ntab click | Features | Features | Features | N/A | Features | N/A | +| User AddressBook | N/A | N/A | N/A | N/A | Features | N/A | \ No newline at end of file diff --git a/navi/README.md b/navi/README.md index 9a4326d96..4e7d325f7 100644 --- a/navi/README.md +++ b/navi/README.md @@ -31,6 +31,11 @@ In navi/navi, you can run the following commands: - `scripts/run_onnx.sh` for [Onnx](https://onnx.ai/) Do note that you need to create a models directory and create some versions, preferably using epoch time, e.g., `1679693908377`. +so the models structure looks like: + models/ + -web_click + - 1809000 + - 1809010 ## Build You can adapt the above scripts to build using Cargo. diff --git a/navi/dr_transform/Cargo.toml b/navi/dr_transform/Cargo.toml index 47f097eb9..cff73375b 100644 --- a/navi/dr_transform/Cargo.toml +++ b/navi/dr_transform/Cargo.toml @@ -3,7 +3,6 @@ name = "dr_transform" version = "0.1.0" edition = "2021" -# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" @@ -12,7 +11,6 @@ bpr_thrift = { path = "../thrift_bpr_adapter/thrift/"} segdense = { path = "../segdense/"} thrift = "0.17.0" ndarray = "0.15" -ort = {git ="https://github.com/pykeio/ort.git", tag="v1.14.2"} base64 = "0.20.0" npyz = "0.7.2" log = "0.4.17" @@ -21,6 +19,11 @@ prometheus = "0.13.1" once_cell = "1.17.0" rand = "0.8.5" itertools = "0.10.5" +anyhow = "1.0.70" +[target.'cfg(not(target_os="linux"))'.dependencies] +ort = {git ="https://github.com/pykeio/ort.git", features=["profiling"], tag="v1.14.6"} +[target.'cfg(target_os="linux")'.dependencies] +ort = {git ="https://github.com/pykeio/ort.git", features=["profiling", "tensorrt", "cuda", "copy-dylibs"], tag="v1.14.6"} [dev-dependencies] criterion = "0.3.0" diff --git a/navi/dr_transform/src/all_config.rs b/navi/dr_transform/src/all_config.rs index 29451bfd4..d5c52c362 100644 --- a/navi/dr_transform/src/all_config.rs +++ b/navi/dr_transform/src/all_config.rs @@ -44,5 +44,6 @@ pub struct RenamedFeatures { } pub fn parse(json_str: &str) -> Result { - serde_json::from_str(json_str) + let all_config: AllConfig = serde_json::from_str(json_str)?; + Ok(all_config) } diff --git a/navi/dr_transform/src/converter.rs b/navi/dr_transform/src/converter.rs index 578d766fd..3097aedc0 100644 --- a/navi/dr_transform/src/converter.rs +++ b/navi/dr_transform/src/converter.rs @@ -2,6 +2,9 @@ use std::collections::BTreeSet; use std::fmt::{self, Debug, Display}; use std::fs; +use crate::all_config; +use crate::all_config::AllConfig; +use anyhow::{bail, Context}; use bpr_thrift::data::DataRecord; use bpr_thrift::prediction_service::BatchPredictionRequest; use bpr_thrift::tensor::GeneralTensor; @@ -16,8 +19,6 @@ use segdense::util; use thrift::protocol::{TBinaryInputProtocol, TSerializable}; use thrift::transport::TBufferChannel; -use crate::{all_config, all_config::AllConfig}; - pub fn log_feature_match( dr: &DataRecord, seg_dense_config: &DensificationTransformSpec, @@ -28,20 +29,24 @@ pub fn log_feature_match( for (feature_id, feature_value) in dr.continuous_features.as_ref().unwrap() { debug!( - "{dr_type} - Continuous Datarecord => Feature ID: {feature_id}, Feature value: {feature_value}" + "{} - Continous Datarecord => Feature ID: {}, Feature value: {}", + dr_type, feature_id, feature_value ); for input_feature in &seg_dense_config.cont.input_features { if input_feature.feature_id == *feature_id { - debug!("Matching input feature: {input_feature:?}") + debug!("Matching input feature: {:?}", input_feature) } } } for feature_id in dr.binary_features.as_ref().unwrap() { - debug!("{dr_type} - Binary Datarecord => Feature ID: {feature_id}"); + debug!( + "{} - Binary Datarecord => Feature ID: {}", + dr_type, feature_id + ); for input_feature in &seg_dense_config.binary.input_features { if input_feature.feature_id == *feature_id { - debug!("Found input feature: {input_feature:?}") + debug!("Found input feature: {:?}", input_feature) } } } @@ -90,18 +95,19 @@ impl BatchPredictionRequestToTorchTensorConverter { model_version: &str, reporting_feature_ids: Vec<(i64, &str)>, register_metric_fn: Option, - ) -> BatchPredictionRequestToTorchTensorConverter { - let all_config_path = format!("{model_dir}/{model_version}/all_config.json"); - let seg_dense_config_path = - format!("{model_dir}/{model_version}/segdense_transform_spec_home_recap_2022.json"); - let seg_dense_config = util::load_config(&seg_dense_config_path); + ) -> anyhow::Result { + let all_config_path = format!("{}/{}/all_config.json", model_dir, model_version); + let seg_dense_config_path = format!( + "{}/{}/segdense_transform_spec_home_recap_2022.json", + model_dir, model_version + ); + let seg_dense_config = util::load_config(&seg_dense_config_path)?; let all_config = all_config::parse( &fs::read_to_string(&all_config_path) - .unwrap_or_else(|error| panic!("error loading all_config.json - {error}")), - ) - .unwrap(); + .with_context(|| "error loading all_config.json - ")?, + )?; - let feature_mapper = util::load_from_parsed_config_ref(&seg_dense_config); + let feature_mapper = util::load_from_parsed_config(seg_dense_config.clone())?; let user_embedding_feature_id = Self::get_feature_id( &all_config @@ -131,11 +137,11 @@ impl BatchPredictionRequestToTorchTensorConverter { let (discrete_feature_metrics, continuous_feature_metrics) = METRICS.get_or_init(|| { let discrete = HistogramVec::new( HistogramOpts::new(":navi:feature_id:discrete", "Discrete Feature ID values") - .buckets(Vec::from([ - 0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, + .buckets(Vec::from(&[ + 0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0, 500.0, 1000.0, 10000.0, 100000.0, - ])), + ] as &'static [f64])), &["feature_id"], ) .expect("metric cannot be created"); @@ -144,18 +150,18 @@ impl BatchPredictionRequestToTorchTensorConverter { ":navi:feature_id:continuous", "continuous Feature ID values", ) - .buckets(Vec::from([ - 0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, - 120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0, - 500.0, 1000.0, 10000.0, 100000.0, - ])), + .buckets(Vec::from(&[ + 0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0, + 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0, 500.0, + 1000.0, 10000.0, 100000.0, + ] as &'static [f64])), &["feature_id"], ) .expect("metric cannot be created"); - if let Some(r) = register_metric_fn { + register_metric_fn.map(|r| { r(&discrete); r(&continuous); - } + }); (discrete, continuous) }); @@ -164,13 +170,16 @@ impl BatchPredictionRequestToTorchTensorConverter { for (feature_id, feature_type) in reporting_feature_ids.iter() { match *feature_type { - "discrete" => discrete_features_to_report.insert(*feature_id), - "continuous" => continuous_features_to_report.insert(*feature_id), - _ => panic!("Invalid feature type {feature_type} for reporting metrics!"), + "discrete" => discrete_features_to_report.insert(feature_id.clone()), + "continuous" => continuous_features_to_report.insert(feature_id.clone()), + _ => bail!( + "Invalid feature type {} for reporting metrics!", + feature_type + ), }; } - BatchPredictionRequestToTorchTensorConverter { + Ok(BatchPredictionRequestToTorchTensorConverter { all_config, seg_dense_config, all_config_path, @@ -183,7 +192,7 @@ impl BatchPredictionRequestToTorchTensorConverter { continuous_features_to_report, discrete_feature_metrics, continuous_feature_metrics, - } + }) } fn get_feature_id(feature_name: &str, seg_dense_config: &Root) -> i64 { @@ -218,43 +227,45 @@ impl BatchPredictionRequestToTorchTensorConverter { let mut working_set = vec![0 as f32; total_size]; let mut bpr_start = 0; for (bpr, &bpr_end) in bprs.iter().zip(batch_size) { - if bpr.common_features.is_some() - && bpr.common_features.as_ref().unwrap().tensors.is_some() - && bpr - .common_features - .as_ref() - .unwrap() - .tensors - .as_ref() - .unwrap() - .contains_key(&feature_id) - { - let source_tensor = bpr - .common_features - .as_ref() - .unwrap() - .tensors - .as_ref() - .unwrap() - .get(&feature_id) - .unwrap(); - let tensor = match source_tensor { - GeneralTensor::FloatTensor(float_tensor) => - //Tensor::of_slice( + if bpr.common_features.is_some() { + if bpr.common_features.as_ref().unwrap().tensors.is_some() { + if bpr + .common_features + .as_ref() + .unwrap() + .tensors + .as_ref() + .unwrap() + .contains_key(&feature_id) { - float_tensor - .floats - .iter() - .map(|x| x.into_inner() as f32) - .collect::>() - } - _ => vec![0 as f32; cols], - }; + let source_tensor = bpr + .common_features + .as_ref() + .unwrap() + .tensors + .as_ref() + .unwrap() + .get(&feature_id) + .unwrap(); + let tensor = match source_tensor { + GeneralTensor::FloatTensor(float_tensor) => + //Tensor::of_slice( + { + float_tensor + .floats + .iter() + .map(|x| x.into_inner() as f32) + .collect::>() + } + _ => vec![0 as f32; cols], + }; - // since the tensor is found in common feature, add it in all batches - for row in bpr_start..bpr_end { - for col in 0..cols { - working_set[row * cols + col] = tensor[col]; + // since the tensor is found in common feature, add it in all batches + for row in bpr_start..bpr_end { + for col in 0..cols { + working_set[row * cols + col] = tensor[col]; + } + } } } } @@ -298,9 +309,9 @@ impl BatchPredictionRequestToTorchTensorConverter { // (INT64 --> INT64, DataRecord.discrete_feature) fn get_continuous(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor { // These need to be part of model schema - let rows = batch_ends[batch_ends.len() - 1]; - let cols = 5293; - let full_size = rows * cols; + let rows: usize = batch_ends[batch_ends.len() - 1]; + let cols: usize = 5293; + let full_size: usize = rows * cols; let default_val = f32::NAN; let mut tensor = vec![default_val; full_size]; @@ -325,15 +336,18 @@ impl BatchPredictionRequestToTorchTensorConverter { .unwrap(); for feature in common_features { - if let Some(f_info) = self.feature_mapper.get(feature.0) { - let idx = f_info.index_within_tensor as usize; - if idx < cols { - // Set value in each row - for r in bpr_start..bpr_end { - let flat_index = r * cols + idx; - tensor[flat_index] = feature.1.into_inner() as f32; + match self.feature_mapper.get(feature.0) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + if idx < cols { + // Set value in each row + for r in bpr_start..bpr_end { + let flat_index: usize = r * cols + idx; + tensor[flat_index] = feature.1.into_inner() as f32; + } } } + None => (), } if self.continuous_features_to_report.contains(feature.0) { self.continuous_feature_metrics @@ -349,24 +363,28 @@ impl BatchPredictionRequestToTorchTensorConverter { // Process the batch of datarecords for r in bpr_start..bpr_end { - let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start]; + let dr: &DataRecord = + &bpr.individual_features_list[usize::try_from(r - bpr_start).unwrap()]; if dr.continuous_features.is_some() { for feature in dr.continuous_features.as_ref().unwrap() { - if let Some(f_info) = self.feature_mapper.get(feature.0) { - let idx = f_info.index_within_tensor as usize; - let flat_index = r * cols + idx; - if flat_index < tensor.len() && idx < cols { - tensor[flat_index] = feature.1.into_inner() as f32; + match self.feature_mapper.get(&feature.0) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + let flat_index: usize = r * cols + idx; + if flat_index < tensor.len() && idx < cols { + tensor[flat_index] = feature.1.into_inner() as f32; + } } + None => (), } if self.continuous_features_to_report.contains(feature.0) { self.continuous_feature_metrics .with_label_values(&[feature.0.to_string().as_str()]) - .observe(feature.1.into_inner()) + .observe(feature.1.into_inner() as f64) } else if self.discrete_features_to_report.contains(feature.0) { self.discrete_feature_metrics .with_label_values(&[feature.0.to_string().as_str()]) - .observe(feature.1.into_inner()) + .observe(feature.1.into_inner() as f64) } } } @@ -383,10 +401,10 @@ impl BatchPredictionRequestToTorchTensorConverter { fn get_binary(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor { // These need to be part of model schema - let rows = batch_ends[batch_ends.len() - 1]; - let cols = 149; - let full_size = rows * cols; - let default_val = 0; + let rows: usize = batch_ends[batch_ends.len() - 1]; + let cols: usize = 149; + let full_size: usize = rows * cols; + let default_val: i64 = 0; let mut v = vec![default_val; full_size]; @@ -410,15 +428,18 @@ impl BatchPredictionRequestToTorchTensorConverter { .unwrap(); for feature in common_features { - if let Some(f_info) = self.feature_mapper.get(feature) { - let idx = f_info.index_within_tensor as usize; - if idx < cols { - // Set value in each row - for r in bpr_start..bpr_end { - let flat_index = r * cols + idx; - v[flat_index] = 1; + match self.feature_mapper.get(feature) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + if idx < cols { + // Set value in each row + for r in bpr_start..bpr_end { + let flat_index: usize = r * cols + idx; + v[flat_index] = 1; + } } } + None => (), } } } @@ -428,10 +449,13 @@ impl BatchPredictionRequestToTorchTensorConverter { let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start]; if dr.binary_features.is_some() { for feature in dr.binary_features.as_ref().unwrap() { - if let Some(f_info) = self.feature_mapper.get(feature) { - let idx = f_info.index_within_tensor as usize; - let flat_index = r * cols + idx; - v[flat_index] = 1; + match self.feature_mapper.get(&feature) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + let flat_index: usize = r * cols + idx; + v[flat_index] = 1; + } + None => (), } } } @@ -448,10 +472,10 @@ impl BatchPredictionRequestToTorchTensorConverter { #[allow(dead_code)] fn get_discrete(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor { // These need to be part of model schema - let rows = batch_ends[batch_ends.len() - 1]; - let cols = 320; - let full_size = rows * cols; - let default_val = 0; + let rows: usize = batch_ends[batch_ends.len() - 1]; + let cols: usize = 320; + let full_size: usize = rows * cols; + let default_val: i64 = 0; let mut v = vec![default_val; full_size]; @@ -475,15 +499,18 @@ impl BatchPredictionRequestToTorchTensorConverter { .unwrap(); for feature in common_features { - if let Some(f_info) = self.feature_mapper.get(feature.0) { - let idx = f_info.index_within_tensor as usize; - if idx < cols { - // Set value in each row - for r in bpr_start..bpr_end { - let flat_index = r * cols + idx; - v[flat_index] = *feature.1; + match self.feature_mapper.get(feature.0) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + if idx < cols { + // Set value in each row + for r in bpr_start..bpr_end { + let flat_index: usize = r * cols + idx; + v[flat_index] = *feature.1; + } } } + None => (), } if self.discrete_features_to_report.contains(feature.0) { self.discrete_feature_metrics @@ -495,15 +522,18 @@ impl BatchPredictionRequestToTorchTensorConverter { // Process the batch of datarecords for r in bpr_start..bpr_end { - let dr: &DataRecord = &bpr.individual_features_list[r]; + let dr: &DataRecord = &bpr.individual_features_list[usize::try_from(r).unwrap()]; if dr.discrete_features.is_some() { for feature in dr.discrete_features.as_ref().unwrap() { - if let Some(f_info) = self.feature_mapper.get(feature.0) { - let idx = f_info.index_within_tensor as usize; - let flat_index = r * cols + idx; - if flat_index < v.len() && idx < cols { - v[flat_index] = *feature.1; + match self.feature_mapper.get(&feature.0) { + Some(f_info) => { + let idx = f_info.index_within_tensor as usize; + let flat_index: usize = r * cols + idx; + if flat_index < v.len() && idx < cols { + v[flat_index] = *feature.1; + } } + None => (), } if self.discrete_features_to_report.contains(feature.0) { self.discrete_feature_metrics @@ -569,7 +599,7 @@ impl Converter for BatchPredictionRequestToTorchTensorConverter { .map(|bpr| bpr.individual_features_list.len()) .scan(0usize, |acc, e| { //running total - *acc += e; + *acc = *acc + e; Some(*acc) }) .collect::>(); diff --git a/navi/dr_transform/src/lib.rs b/navi/dr_transform/src/lib.rs index 25b7cd2d3..ea3b25a55 100644 --- a/navi/dr_transform/src/lib.rs +++ b/navi/dr_transform/src/lib.rs @@ -3,3 +3,4 @@ pub mod converter; #[cfg(test)] mod test; pub mod util; +pub extern crate ort; diff --git a/navi/navi/Cargo.toml b/navi/navi/Cargo.toml index a942b1ae4..e355ea2a7 100644 --- a/navi/navi/Cargo.toml +++ b/navi/navi/Cargo.toml @@ -1,8 +1,7 @@ [package] name = "navi" -version = "2.0.42" +version = "2.0.45" edition = "2021" -# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [[bin]] name = "navi" @@ -16,12 +15,19 @@ required-features=["torch"] name = "navi_onnx" path = "src/bin/navi_onnx.rs" required-features=["onnx"] +[[bin]] +name = "navi_onnx_test" +path = "src/bin/bin_tests/navi_onnx_test.rs" +[[bin]] +name = "navi_torch_test" +path = "src/bin/bin_tests/navi_torch_test.rs" +required-features=["torch"] [features] default=[] navi_console=[] torch=["tch"] -onnx=["ort"] +onnx=[] tf=["tensorflow"] [dependencies] itertools = "0.10.5" @@ -47,6 +53,7 @@ parking_lot = "0.12.1" rand = "0.8.5" rand_pcg = "0.3.1" random = "0.12.2" +x509-parser = "0.15.0" sha256 = "1.0.3" tonic = { version = "0.6.2", features=['compression', 'tls'] } tokio = { version = "1.17.0", features = ["macros", "rt-multi-thread", "fs", "process"] } @@ -55,16 +62,12 @@ npyz = "0.7.3" base64 = "0.21.0" histogram = "0.6.9" tch = {version = "0.10.3", optional = true} -tensorflow = { version = "0.20.0", optional = true } +tensorflow = { version = "0.18.0", optional = true } once_cell = {version = "1.17.1"} ndarray = "0.15" serde = "1.0.154" serde_json = "1.0.94" dr_transform = { path = "../dr_transform"} -[target.'cfg(not(target_os="linux"))'.dependencies] -ort = {git ="https://github.com/pykeio/ort.git", features=["profiling"], optional = true, tag="v1.14.2"} -[target.'cfg(target_os="linux")'.dependencies] -ort = {git ="https://github.com/pykeio/ort.git", features=["profiling", "tensorrt", "cuda", "copy-dylibs"], optional = true, tag="v1.14.2"} [build-dependencies] tonic-build = {version = "0.6.2", features=['prost', "compression"] } [profile.release] @@ -74,3 +77,5 @@ ndarray-rand = "0.14.0" tokio-test = "*" assert_cmd = "2.0" criterion = "0.4.0" + + diff --git a/navi/navi/proto/tensorflow/core/framework/full_type.proto b/navi/navi/proto/tensorflow/core/framework/full_type.proto index e8175ed3d..ddf05ec8f 100644 --- a/navi/navi/proto/tensorflow/core/framework/full_type.proto +++ b/navi/navi/proto/tensorflow/core/framework/full_type.proto @@ -122,7 +122,7 @@ enum FullTypeId { // TFT_TENSOR[TFT_INT32, TFT_UNKNOWN] // is a Tensor of int32 element type and unknown shape. // - // TODO: Define TFT_SHAPE and add more examples. + // TODO(mdan): Define TFT_SHAPE and add more examples. TFT_TENSOR = 1000; // Array (or tensorflow::TensorList in the variant type registry). @@ -178,7 +178,7 @@ enum FullTypeId { // object (for now). // The bool element type. - // TODO + // TODO(mdan): Quantized types, legacy representations (e.g. ref) TFT_BOOL = 200; // Integer element types. TFT_UINT8 = 201; @@ -195,7 +195,7 @@ enum FullTypeId { TFT_DOUBLE = 211; TFT_BFLOAT16 = 215; // Complex element types. - // TODO: Represent as TFT_COMPLEX[TFT_DOUBLE] instead? + // TODO(mdan): Represent as TFT_COMPLEX[TFT_DOUBLE] instead? TFT_COMPLEX64 = 212; TFT_COMPLEX128 = 213; // The string element type. @@ -240,7 +240,7 @@ enum FullTypeId { // ownership is in the true sense: "the op argument representing the lock is // available". // Mutex locks are the dynamic counterpart of control dependencies. - // TODO: Properly document this thing. + // TODO(mdan): Properly document this thing. // // Parametrization: TFT_MUTEX_LOCK[]. TFT_MUTEX_LOCK = 10202; @@ -271,6 +271,6 @@ message FullTypeDef { oneof attr { string s = 3; int64 i = 4; - // TODO: list/tensor, map? Need to reconcile with TFT_RECORD, etc. + // TODO(mdan): list/tensor, map? Need to reconcile with TFT_RECORD, etc. } } diff --git a/navi/navi/proto/tensorflow/core/framework/function.proto b/navi/navi/proto/tensorflow/core/framework/function.proto index efa3c9aeb..6e59df718 100644 --- a/navi/navi/proto/tensorflow/core/framework/function.proto +++ b/navi/navi/proto/tensorflow/core/framework/function.proto @@ -23,7 +23,7 @@ message FunctionDefLibrary { // with a value. When a GraphDef has a call to a function, it must // have binding for every attr defined in the signature. // -// TODO: +// TODO(zhifengc): // * device spec, etc. message FunctionDef { // The definition of the function's name, arguments, return values, diff --git a/navi/navi/proto/tensorflow/core/framework/node_def.proto b/navi/navi/proto/tensorflow/core/framework/node_def.proto index 801759817..705e90aa3 100644 --- a/navi/navi/proto/tensorflow/core/framework/node_def.proto +++ b/navi/navi/proto/tensorflow/core/framework/node_def.proto @@ -61,7 +61,7 @@ message NodeDef { // one of the names from the corresponding OpDef's attr field). // The values must have a type matching the corresponding OpDef // attr's type field. - // TODO: Add some examples here showing best practices. + // TODO(josh11b): Add some examples here showing best practices. map attr = 5; message ExperimentalDebugInfo { diff --git a/navi/navi/proto/tensorflow/core/framework/op_def.proto b/navi/navi/proto/tensorflow/core/framework/op_def.proto index a53fdf028..b71f5ce87 100644 --- a/navi/navi/proto/tensorflow/core/framework/op_def.proto +++ b/navi/navi/proto/tensorflow/core/framework/op_def.proto @@ -96,7 +96,7 @@ message OpDef { // Human-readable description. string description = 4; - // TODO: bool is_optional? + // TODO(josh11b): bool is_optional? // --- Constraints --- // These constraints are only in effect if specified. Default is no @@ -139,7 +139,7 @@ message OpDef { // taking input from multiple devices with a tree of aggregate ops // that aggregate locally within each device (and possibly within // groups of nearby devices) before communicating. - // TODO: Implement that optimization. + // TODO(josh11b): Implement that optimization. bool is_aggregate = 16; // for things like add // Other optimizations go here, like diff --git a/navi/navi/proto/tensorflow/core/framework/step_stats.proto b/navi/navi/proto/tensorflow/core/framework/step_stats.proto index 62238234d..762487f02 100644 --- a/navi/navi/proto/tensorflow/core/framework/step_stats.proto +++ b/navi/navi/proto/tensorflow/core/framework/step_stats.proto @@ -53,7 +53,7 @@ message MemoryStats { // Time/size stats recorded for a single execution of a graph node. message NodeExecStats { - // TODO: Use some more compact form of node identity than + // TODO(tucker): Use some more compact form of node identity than // the full string name. Either all processes should agree on a // global id (cost_id?) for each node, or we should use a hash of // the name. diff --git a/navi/navi/proto/tensorflow/core/framework/tensor.proto b/navi/navi/proto/tensorflow/core/framework/tensor.proto index 2d4b593be..eb057b127 100644 --- a/navi/navi/proto/tensorflow/core/framework/tensor.proto +++ b/navi/navi/proto/tensorflow/core/framework/tensor.proto @@ -16,7 +16,7 @@ option go_package = "github.com/tensorflow/tensorflow/tensorflow/go/core/framewo message TensorProto { DataType dtype = 1; - // Shape of the tensor. TODO: sort out the 0-rank issues. + // Shape of the tensor. TODO(touts): sort out the 0-rank issues. TensorShapeProto tensor_shape = 2; // Only one of the representations below is set, one of "tensor_contents" and diff --git a/navi/navi/proto/tensorflow/core/protobuf/config.proto b/navi/navi/proto/tensorflow/core/protobuf/config.proto index ff78e1f22..e454309fc 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/config.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/config.proto @@ -532,7 +532,7 @@ message ConfigProto { // We removed the flag client_handles_error_formatting. Marking the tag // number as reserved. - // TODO: Should we just remove this tag so that it can be + // TODO(shikharagarwal): Should we just remove this tag so that it can be // used in future for other purpose? reserved 2; @@ -576,7 +576,7 @@ message ConfigProto { // - If isolate_session_state is true, session states are isolated. // - If isolate_session_state is false, session states are shared. // - // TODO: Add a single API that consistently treats + // TODO(b/129330037): Add a single API that consistently treats // isolate_session_state and ClusterSpec propagation. bool share_session_state_in_clusterspec_propagation = 8; @@ -704,7 +704,7 @@ message ConfigProto { // Options for a single Run() call. message RunOptions { - // TODO Turn this into a TraceOptions proto which allows + // TODO(pbar) Turn this into a TraceOptions proto which allows // tracing to be controlled in a more orthogonal manner? enum TraceLevel { NO_TRACE = 0; @@ -781,7 +781,7 @@ message RunMetadata { repeated GraphDef partition_graphs = 3; message FunctionGraphs { - // TODO: Include some sort of function/cache-key identifier? + // TODO(nareshmodi): Include some sort of function/cache-key identifier? repeated GraphDef partition_graphs = 1; GraphDef pre_optimization_graph = 2; diff --git a/navi/navi/proto/tensorflow/core/protobuf/coordination_service.proto b/navi/navi/proto/tensorflow/core/protobuf/coordination_service.proto index e190bb028..730fb8c10 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/coordination_service.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/coordination_service.proto @@ -194,7 +194,7 @@ service CoordinationService { // Report error to the task. RPC sets the receiving instance of coordination // service agent to error state permanently. - // TODO: Consider splitting this into a different RPC service. + // TODO(b/195990880): Consider splitting this into a different RPC service. rpc ReportErrorToAgent(ReportErrorToAgentRequest) returns (ReportErrorToAgentResponse); diff --git a/navi/navi/proto/tensorflow/core/protobuf/debug.proto b/navi/navi/proto/tensorflow/core/protobuf/debug.proto index 1cc76f1ed..2fabd0319 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/debug.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/debug.proto @@ -46,7 +46,7 @@ message DebugTensorWatch { // are to be debugged, the callers of Session::Run() must use distinct // debug_urls to make sure that the streamed or dumped events do not overlap // among the invocations. - // TODO: More visible documentation of this in g3docs. + // TODO(cais): More visible documentation of this in g3docs. repeated string debug_urls = 4; // Do not error out if debug op creation fails (e.g., due to dtype diff --git a/navi/navi/proto/tensorflow/core/protobuf/debug_event.proto b/navi/navi/proto/tensorflow/core/protobuf/debug_event.proto index b68f45d4d..5530004d7 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/debug_event.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/debug_event.proto @@ -12,7 +12,7 @@ option java_package = "org.tensorflow.util"; option go_package = "github.com/tensorflow/tensorflow/tensorflow/go/core/protobuf/for_core_protos_go_proto"; // Available modes for extracting debugging information from a Tensor. -// TODO: Document the detailed column names and semantics in a separate +// TODO(cais): Document the detailed column names and semantics in a separate // markdown file once the implementation settles. enum TensorDebugMode { UNSPECIFIED = 0; @@ -223,7 +223,7 @@ message DebuggedDevice { // A debugger-generated ID for the device. Guaranteed to be unique within // the scope of the debugged TensorFlow program, including single-host and // multi-host settings. - // TODO: Test the uniqueness guarantee in multi-host settings. + // TODO(cais): Test the uniqueness guarantee in multi-host settings. int32 device_id = 2; } @@ -264,7 +264,7 @@ message Execution { // field with the DebuggedDevice messages. repeated int32 output_tensor_device_ids = 9; - // TODO support, add more fields + // TODO(cais): When backporting to V1 Session.run() support, add more fields // such as fetches and feeds. } diff --git a/navi/navi/proto/tensorflow/core/protobuf/distributed_runtime_payloads.proto b/navi/navi/proto/tensorflow/core/protobuf/distributed_runtime_payloads.proto index c19da9d82..ddb346afa 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/distributed_runtime_payloads.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/distributed_runtime_payloads.proto @@ -7,7 +7,7 @@ option go_package = "github.com/tensorflow/tensorflow/tensorflow/go/core/protobu // Used to serialize and transmit tensorflow::Status payloads through // grpc::Status `error_details` since grpc::Status lacks payload API. -// TODO: Use GRPC API once supported. +// TODO(b/204231601): Use GRPC API once supported. message GrpcPayloadContainer { map payloads = 1; } diff --git a/navi/navi/proto/tensorflow/core/protobuf/eager_service.proto b/navi/navi/proto/tensorflow/core/protobuf/eager_service.proto index 9d658c7d9..204acf6b1 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/eager_service.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/eager_service.proto @@ -172,7 +172,7 @@ message WaitQueueDoneRequest { } message WaitQueueDoneResponse { - // TODO: Consider adding NodeExecStats here to be able to + // TODO(nareshmodi): Consider adding NodeExecStats here to be able to // propagate some stats. } diff --git a/navi/navi/proto/tensorflow/core/protobuf/master.proto b/navi/navi/proto/tensorflow/core/protobuf/master.proto index 60555cd58..e1732a932 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/master.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/master.proto @@ -94,7 +94,7 @@ message ExtendSessionRequest { } message ExtendSessionResponse { - // TODO: Return something about the operation? + // TODO(mrry): Return something about the operation? // The new version number for the extended graph, to be used in the next call // to ExtendSession. diff --git a/navi/navi/proto/tensorflow/core/protobuf/saved_object_graph.proto b/navi/navi/proto/tensorflow/core/protobuf/saved_object_graph.proto index 70b31f0e6..a59ad0ed2 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/saved_object_graph.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/saved_object_graph.proto @@ -176,7 +176,7 @@ message SavedBareConcreteFunction { // allows the ConcreteFunction to be called with nest structure inputs. This // field may not be populated. If this field is absent, the concrete function // can only be called with flat inputs. - // TODO: support calling saved ConcreteFunction with structured + // TODO(b/169361281): support calling saved ConcreteFunction with structured // inputs in C++ SavedModel API. FunctionSpec function_spec = 4; } diff --git a/navi/navi/proto/tensorflow/core/protobuf/tensor_bundle.proto b/navi/navi/proto/tensorflow/core/protobuf/tensor_bundle.proto index 4433afae2..999195cc9 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/tensor_bundle.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/tensor_bundle.proto @@ -17,7 +17,7 @@ option go_package = "github.com/tensorflow/tensorflow/tensorflow/go/core/protobu // Special header that is associated with a bundle. // -// TODO: maybe in the future, we can add information about +// TODO(zongheng,zhifengc): maybe in the future, we can add information about // which binary produced this checkpoint, timestamp, etc. Sometime, these can be // valuable debugging information. And if needed, these can be used as defensive // information ensuring reader (binary version) of the checkpoint and the writer diff --git a/navi/navi/proto/tensorflow/core/protobuf/worker.proto b/navi/navi/proto/tensorflow/core/protobuf/worker.proto index 0df080c77..18d60b568 100644 --- a/navi/navi/proto/tensorflow/core/protobuf/worker.proto +++ b/navi/navi/proto/tensorflow/core/protobuf/worker.proto @@ -188,7 +188,7 @@ message DeregisterGraphRequest { } message DeregisterGraphResponse { - // TODO: Optionally add summary stats for the graph. + // TODO(mrry): Optionally add summary stats for the graph. } //////////////////////////////////////////////////////////////////////////////// @@ -294,7 +294,7 @@ message RunGraphResponse { // If the request asked for execution stats, the cost graph, or the partition // graphs, these are returned here. - // TODO: Package these in a RunMetadata instead. + // TODO(suharshs): Package these in a RunMetadata instead. StepStats step_stats = 2; CostGraphDef cost_graph = 3; repeated GraphDef partition_graph = 4; diff --git a/navi/navi/proto/tensorflow_serving/apis/logging.proto b/navi/navi/proto/tensorflow_serving/apis/logging.proto index 9d304f44d..6298bb4b2 100644 --- a/navi/navi/proto/tensorflow_serving/apis/logging.proto +++ b/navi/navi/proto/tensorflow_serving/apis/logging.proto @@ -13,5 +13,5 @@ message LogMetadata { SamplingConfig sampling_config = 2; // List of tags used to load the relevant MetaGraphDef from SavedModel. repeated string saved_model_tags = 3; - // TODO: Add more metadata as mentioned in the bug. + // TODO(b/33279154): Add more metadata as mentioned in the bug. } diff --git a/navi/navi/proto/tensorflow_serving/config/file_system_storage_path_source.proto b/navi/navi/proto/tensorflow_serving/config/file_system_storage_path_source.proto index 8d8541d4f..add7aa2a2 100644 --- a/navi/navi/proto/tensorflow_serving/config/file_system_storage_path_source.proto +++ b/navi/navi/proto/tensorflow_serving/config/file_system_storage_path_source.proto @@ -58,7 +58,7 @@ message FileSystemStoragePathSourceConfig { // A single servable name/base_path pair to monitor. // DEPRECATED: Use 'servables' instead. - // TODO: Stop using these fields, and ultimately remove them here. + // TODO(b/30898016): Stop using these fields, and ultimately remove them here. string servable_name = 1 [deprecated = true]; string base_path = 2 [deprecated = true]; @@ -76,7 +76,7 @@ message FileSystemStoragePathSourceConfig { // check for a version to appear later.) // DEPRECATED: Use 'servable_versions_always_present' instead, which includes // this behavior. - // TODO: Remove 2019-10-31 or later. + // TODO(b/30898016): Remove 2019-10-31 or later. bool fail_if_zero_versions_at_startup = 4 [deprecated = true]; // If true, the servable is always expected to exist on the underlying diff --git a/navi/navi/proto/tensorflow_serving/config/model_server_config.proto b/navi/navi/proto/tensorflow_serving/config/model_server_config.proto index 0f80aa1c7..cadc2b6e6 100644 --- a/navi/navi/proto/tensorflow_serving/config/model_server_config.proto +++ b/navi/navi/proto/tensorflow_serving/config/model_server_config.proto @@ -9,7 +9,7 @@ import "tensorflow_serving/config/logging_config.proto"; option cc_enable_arenas = true; // The type of model. -// TODO: DEPRECATED. +// TODO(b/31336131): DEPRECATED. enum ModelType { MODEL_TYPE_UNSPECIFIED = 0 [deprecated = true]; TENSORFLOW = 1 [deprecated = true]; @@ -31,7 +31,7 @@ message ModelConfig { string base_path = 2; // Type of model. - // TODO: DEPRECATED. Please use 'model_platform' instead. + // TODO(b/31336131): DEPRECATED. Please use 'model_platform' instead. ModelType model_type = 3 [deprecated = true]; // Type of model (e.g. "tensorflow"). diff --git a/navi/navi/scripts/run_onnx.sh b/navi/navi/scripts/run_onnx.sh index ae6ff10b6..cc8695f4a 100644 --- a/navi/navi/scripts/run_onnx.sh +++ b/navi/navi/scripts/run_onnx.sh @@ -1,10 +1,9 @@ #!/bin/sh #RUST_LOG=debug LD_LIBRARY_PATH=so/onnx/lib target/release/navi_onnx --port 30 --num-worker-threads 8 --intra-op-parallelism 8 --inter-op-parallelism 8 \ RUST_LOG=info LD_LIBRARY_PATH=so/onnx/lib cargo run --bin navi_onnx --features onnx -- \ - --port 30 --num-worker-threads 8 --intra-op-parallelism 8 --inter-op-parallelism 8 \ + --port 8030 --num-worker-threads 8 \ --model-check-interval-secs 30 \ - --model-dir models/int8 \ - --output caligrated_probabilities \ - --input "" \ --modelsync-cli "echo" \ - --onnx-ep-options use_arena=true + --onnx-ep-options use_arena=true \ + --model-dir models/prod_home --output caligrated_probabilities --input "" --intra-op-parallelism 8 --inter-op-parallelism 8 --max-batch-size 1 --batch-time-out-millis 1 \ + --model-dir models/prod_home1 --output caligrated_probabilities --input "" --intra-op-parallelism 8 --inter-op-parallelism 8 --max-batch-size 1 --batch-time-out-millis 1 \ diff --git a/navi/navi/src/bin/navi_onnx.rs b/navi/navi/src/bin/navi_onnx.rs index ac73a3d16..03b1ea2aa 100644 --- a/navi/navi/src/bin/navi_onnx.rs +++ b/navi/navi/src/bin/navi_onnx.rs @@ -1,11 +1,24 @@ use anyhow::Result; +use log::info; use navi::cli_args::{ARGS, MODEL_SPECS}; use navi::onnx_model::onnx::OnnxModel; use navi::{bootstrap, metrics}; fn main() -> Result<()> { env_logger::init(); - assert_eq!(MODEL_SPECS.len(), ARGS.inter_op_parallelism.len()); + info!("global: {:?}", ARGS.onnx_global_thread_pool_options); + let assert_session_params = if ARGS.onnx_global_thread_pool_options.is_empty() { + // std::env::set_var("OMP_NUM_THREADS", "1"); + info!("now we use per session thread pool"); + MODEL_SPECS.len() + } + else { + info!("now we use global thread pool"); + 0 + }; + assert_eq!(assert_session_params, ARGS.inter_op_parallelism.len()); + assert_eq!(assert_session_params, ARGS.inter_op_parallelism.len()); + metrics::register_custom_metrics(); bootstrap::bootstrap(OnnxModel::new) } diff --git a/navi/navi/src/bootstrap.rs b/navi/navi/src/bootstrap.rs index b9f3014c7..edc5fddf6 100644 --- a/navi/navi/src/bootstrap.rs +++ b/navi/navi/src/bootstrap.rs @@ -1,5 +1,6 @@ use anyhow::Result; use log::{info, warn}; +use x509_parser::{prelude::{parse_x509_pem}, parse_x509_certificate}; use std::collections::HashMap; use tokio::time::Instant; use tonic::{ @@ -27,6 +28,7 @@ use crate::cli_args::{ARGS, INPUTS, OUTPUTS}; use crate::metrics::{ NAVI_VERSION, NUM_PREDICTIONS, NUM_REQUESTS_FAILED, NUM_REQUESTS_FAILED_BY_MODEL, NUM_REQUESTS_RECEIVED, NUM_REQUESTS_RECEIVED_BY_MODEL, RESPONSE_TIME_COLLECTOR, + CERT_EXPIRY_EPOCH }; use crate::predict_service::{Model, PredictService}; use crate::tf_proto::tensorflow_serving::model_spec::VersionChoice::Version; @@ -207,6 +209,9 @@ impl PredictionService for PredictService { PredictResult::DropDueToOverload => Err(Status::resource_exhausted("")), PredictResult::ModelNotFound(idx) => { Err(Status::not_found(format!("model index {}", idx))) + }, + PredictResult::ModelNotReady(idx) => { + Err(Status::unavailable(format!("model index {}", idx))) } PredictResult::ModelVersionNotFound(idx, version) => Err( Status::not_found(format!("model index:{}, version {}", idx, version)), @@ -230,6 +235,12 @@ impl PredictionService for PredictService { } } +// A function that takes a timestamp as input and returns a ticker stream +fn report_expiry(expiry_time: i64) { + info!("Certificate expires at epoch: {:?}", expiry_time); + CERT_EXPIRY_EPOCH.set(expiry_time as i64); +} + pub fn bootstrap(model_factory: ModelFactory) -> Result<()> { info!("package: {}, version: {}, args: {:?}", NAME, VERSION, *ARGS); //we follow SemVer. So here we assume MAJOR.MINOR.PATCH @@ -246,6 +257,7 @@ pub fn bootstrap(model_factory: ModelFactory) -> Result<()> { ); } + tokio::runtime::Builder::new_multi_thread() .thread_name("async worker") .worker_threads(ARGS.num_worker_threads) @@ -263,6 +275,21 @@ pub fn bootstrap(model_factory: ModelFactory) -> Result<()> { let mut builder = if ARGS.ssl_dir.is_empty() { Server::builder() } else { + // Read the pem file as a string + let pem_str = std::fs::read_to_string(format!("{}/server.crt", ARGS.ssl_dir)).unwrap(); + let res = parse_x509_pem(&pem_str.as_bytes()); + match res { + Ok((rem, pem_2)) => { + assert!(rem.is_empty()); + assert_eq!(pem_2.label, String::from("CERTIFICATE")); + let res_x509 = parse_x509_certificate(&pem_2.contents); + info!("Certificate label: {}", pem_2.label); + assert!(res_x509.is_ok()); + report_expiry(res_x509.unwrap().1.validity().not_after.timestamp()); + }, + _ => panic!("PEM parsing failed: {:?}", res), + } + let key = tokio::fs::read(format!("{}/server.key", ARGS.ssl_dir)) .await .expect("can't find key file"); @@ -278,7 +305,7 @@ pub fn bootstrap(model_factory: ModelFactory) -> Result<()> { let identity = Identity::from_pem(pem.clone(), key); let client_ca_cert = Certificate::from_pem(pem.clone()); let tls = ServerTlsConfig::new() - .identity(identity) + .identity(identity) .client_ca_root(client_ca_cert); Server::builder() .tls_config(tls) diff --git a/navi/navi/src/cli_args.rs b/navi/navi/src/cli_args.rs index ec7e31f89..21375cea1 100644 --- a/navi/navi/src/cli_args.rs +++ b/navi/navi/src/cli_args.rs @@ -87,13 +87,11 @@ pub struct Args { pub intra_op_parallelism: Vec, #[clap( long, - default_value = "14", help = "number of threads to parallelize computations of the graph" )] pub inter_op_parallelism: Vec, #[clap( long, - default_value = "serving_default", help = "signature of a serving. only TF" )] pub serving_sig: Vec, @@ -107,10 +105,12 @@ pub struct Args { help = "max warmup records to use. warmup only implemented for TF" )] pub max_warmup_records: usize, + #[clap(long, value_parser = Args::parse_key_val::, value_delimiter=',')] + pub onnx_global_thread_pool_options: Vec<(String, String)>, #[clap( - long, - default_value = "true", - help = "when to use graph parallelization. only for ONNX" + long, + default_value = "true", + help = "when to use graph parallelization. only for ONNX" )] pub onnx_use_parallel_mode: String, // #[clap(long, default_value = "false")] diff --git a/navi/navi/src/lib.rs b/navi/navi/src/lib.rs index a13e0332d..b942a1a4d 100644 --- a/navi/navi/src/lib.rs +++ b/navi/navi/src/lib.rs @@ -144,6 +144,7 @@ pub enum PredictResult { Ok(Vec, i64), DropDueToOverload, ModelNotFound(usize), + ModelNotReady(usize), ModelVersionNotFound(usize, i64), } diff --git a/navi/navi/src/metrics.rs b/navi/navi/src/metrics.rs index 7cc9e6fcf..373f84f0f 100644 --- a/navi/navi/src/metrics.rs +++ b/navi/navi/src/metrics.rs @@ -171,6 +171,9 @@ lazy_static! { &["model_name"] ) .expect("metric can be created"); + pub static ref CERT_EXPIRY_EPOCH: IntGauge = + IntGauge::new(":navi:cert_expiry_epoch", "Timestamp when the current cert expires") + .expect("metric can be created"); } pub fn register_custom_metrics() { @@ -249,6 +252,10 @@ pub fn register_custom_metrics() { REGISTRY .register(Box::new(CONVERTER_TIME_COLLECTOR.clone())) .expect("collector can be registered"); + REGISTRY + .register(Box::new(CERT_EXPIRY_EPOCH.clone())) + .expect("collector can be registered"); + } pub fn register_dynamic_metrics(c: &HistogramVec) { diff --git a/navi/navi/src/onnx_model.rs b/navi/navi/src/onnx_model.rs index 991fab83a..18f116570 100644 --- a/navi/navi/src/onnx_model.rs +++ b/navi/navi/src/onnx_model.rs @@ -13,21 +13,22 @@ pub mod onnx { use dr_transform::converter::{BatchPredictionRequestToTorchTensorConverter, Converter}; use itertools::Itertools; use log::{debug, info}; - use ort::environment::Environment; - use ort::session::Session; - use ort::tensor::InputTensor; - use ort::{ExecutionProvider, GraphOptimizationLevel, SessionBuilder}; + use dr_transform::ort::environment::Environment; + use dr_transform::ort::session::Session; + use dr_transform::ort::tensor::InputTensor; + use dr_transform::ort::{ExecutionProvider, GraphOptimizationLevel, SessionBuilder}; + use dr_transform::ort::LoggingLevel; use serde_json::Value; use std::fmt::{Debug, Display}; use std::sync::Arc; use std::{fmt, fs}; use tokio::time::Instant; - lazy_static! { pub static ref ENVIRONMENT: Arc = Arc::new( Environment::builder() .with_name("onnx home") - .with_log_level(ort::LoggingLevel::Error) + .with_log_level(LoggingLevel::Error) + .with_global_thread_pool(ARGS.onnx_global_thread_pool_options.clone()) .build() .unwrap() ); @@ -101,23 +102,30 @@ pub mod onnx { let meta_info = format!("{}/{}/{}", ARGS.model_dir[idx], version, META_INFO); let mut builder = SessionBuilder::new(&ENVIRONMENT)? .with_optimization_level(GraphOptimizationLevel::Level3)? - .with_parallel_execution(ARGS.onnx_use_parallel_mode == "true")? - .with_inter_threads( - utils::get_config_or( - model_config, - "inter_op_parallelism", - &ARGS.inter_op_parallelism[idx], - ) - .parse()?, - )? - .with_intra_threads( - utils::get_config_or( - model_config, - "intra_op_parallelism", - &ARGS.intra_op_parallelism[idx], - ) - .parse()?, - )? + .with_parallel_execution(ARGS.onnx_use_parallel_mode == "true")?; + if ARGS.onnx_global_thread_pool_options.is_empty() { + builder = builder + .with_inter_threads( + utils::get_config_or( + model_config, + "inter_op_parallelism", + &ARGS.inter_op_parallelism[idx], + ) + .parse()?, + )? + .with_intra_threads( + utils::get_config_or( + model_config, + "intra_op_parallelism", + &ARGS.intra_op_parallelism[idx], + ) + .parse()?, + )?; + } + else { + builder = builder.with_disable_per_session_threads()?; + } + builder = builder .with_memory_pattern(ARGS.onnx_use_memory_pattern == "true")? .with_execution_providers(&OnnxModel::ep_choices())?; match &ARGS.profiling { @@ -181,7 +189,7 @@ pub mod onnx { &version, reporting_feature_ids, Some(metrics::register_dynamic_metrics), - )), + )?), }; onnx_model.warmup()?; Ok(onnx_model) diff --git a/navi/navi/src/predict_service.rs b/navi/navi/src/predict_service.rs index 25ba4b848..fc355d7ea 100644 --- a/navi/navi/src/predict_service.rs +++ b/navi/navi/src/predict_service.rs @@ -1,7 +1,7 @@ use anyhow::{anyhow, Result}; use arrayvec::ArrayVec; use itertools::Itertools; -use log::{error, info, warn}; +use log::{error, info}; use std::fmt::{Debug, Display}; use std::string::String; use std::sync::Arc; @@ -24,7 +24,7 @@ use serde_json::{self, Value}; pub trait Model: Send + Sync + Display + Debug + 'static { fn warmup(&self) -> Result<()>; - //TODO: refactor this to return Vec>, i.e. + //TODO: refactor this to return vec>, i.e. //we have the underlying runtime impl to split the response to each client. //It will eliminate some inefficient memory copy in onnx_model.rs as well as simplify code fn do_predict( @@ -179,17 +179,17 @@ impl PredictService { //initialize the latest version array let mut cur_versions = vec!["".to_owned(); MODEL_SPECS.len()]; loop { - let config = utils::read_config(&meta_file).unwrap_or_else(|e| { - warn!("config file {} not found due to: {}", meta_file, e); - Value::Null - }); info!("***polling for models***"); //nice deliminter - info!("config:{}", config); if let Some(ref cli) = ARGS.modelsync_cli { if let Err(e) = call_external_modelsync(cli, &cur_versions).await { error!("model sync cli running error:{}", e) } } + let config = utils::read_config(&meta_file).unwrap_or_else(|e| { + info!("config file {} not found due to: {}", meta_file, e); + Value::Null + }); + info!("config:{}", config); for (idx, cur_version) in cur_versions.iter_mut().enumerate() { let model_dir = &ARGS.model_dir[idx]; PredictService::scan_load_latest_model_from_model_dir( @@ -222,33 +222,39 @@ impl PredictService { .map(|b| b.parse().unwrap()) .collect::>(); let no_msg_wait_millis = *batch_time_out_millis.iter().min().unwrap(); - let mut all_model_predictors = - ArrayVec::, MAX_VERSIONS_PER_MODEL>, MAX_NUM_MODELS>::new(); + let mut all_model_predictors: ArrayVec::, MAX_VERSIONS_PER_MODEL>, MAX_NUM_MODELS> = + (0 ..MAX_NUM_MODELS).map( |_| ArrayVec::, MAX_VERSIONS_PER_MODEL>::new()).collect(); loop { let msg = rx.try_recv(); let no_more_msg = match msg { Ok(PredictMessage::Predict(model_spec_at, version, val, resp, ts)) => { if let Some(model_predictors) = all_model_predictors.get_mut(model_spec_at) { - match version { - None => model_predictors[0].push(val, resp, ts), - Some(the_version) => match model_predictors - .iter_mut() - .find(|x| x.model.version() == the_version) - { - None => resp - .send(PredictResult::ModelVersionNotFound( - model_spec_at, - the_version, - )) - .unwrap_or_else(|e| { - error!("cannot send back version error: {:?}", e) - }), - Some(predictor) => predictor.push(val, resp, ts), - }, + if model_predictors.is_empty() { + resp.send(PredictResult::ModelNotReady(model_spec_at)) + .unwrap_or_else(|e| error!("cannot send back model not ready error: {:?}", e)); + } + else { + match version { + None => model_predictors[0].push(val, resp, ts), + Some(the_version) => match model_predictors + .iter_mut() + .find(|x| x.model.version() == the_version) + { + None => resp + .send(PredictResult::ModelVersionNotFound( + model_spec_at, + the_version, + )) + .unwrap_or_else(|e| { + error!("cannot send back version error: {:?}", e) + }), + Some(predictor) => predictor.push(val, resp, ts), + }, + } } } else { resp.send(PredictResult::ModelNotFound(model_spec_at)) - .unwrap_or_else(|e| error!("cannot send back model error: {:?}", e)) + .unwrap_or_else(|e| error!("cannot send back model not found error: {:?}", e)) } MPSC_CHANNEL_SIZE.dec(); false @@ -266,27 +272,23 @@ impl PredictService { queue_reset_ts: Instant::now(), queue_earliest_rq_ts: Instant::now(), }; - if idx < all_model_predictors.len() { - metrics::NEW_MODEL_SNAPSHOT - .with_label_values(&[&MODEL_SPECS[idx]]) - .inc(); + assert!(idx < all_model_predictors.len()); + metrics::NEW_MODEL_SNAPSHOT + .with_label_values(&[&MODEL_SPECS[idx]]) + .inc(); - info!("now we serve updated model: {}", predictor.model); - //we can do this since the vector is small - let predictors = &mut all_model_predictors[idx]; - if predictors.len() == ARGS.versions_per_model { - predictors.remove(predictors.len() - 1); - } - predictors.insert(0, predictor); - } else { - info!("now we serve new model: {:}", predictor.model); - let mut predictors = - ArrayVec::, MAX_VERSIONS_PER_MODEL>::new(); - predictors.push(predictor); - all_model_predictors.push(predictors); - //check the invariant that we always push the last model to the end - assert_eq!(all_model_predictors.len(), idx + 1) + //we can do this since the vector is small + let predictors = &mut all_model_predictors[idx]; + if predictors.len() == 0 { + info!("now we serve new model: {}", predictor.model); } + else { + info!("now we serve updated model: {}", predictor.model); + } + if predictors.len() == ARGS.versions_per_model { + predictors.remove(predictors.len() - 1); + } + predictors.insert(0, predictor); false } Err(TryRecvError::Empty) => true, diff --git a/navi/segdense/Cargo.toml b/navi/segdense/Cargo.toml index 4adbf2bc1..1c8abc58c 100644 --- a/navi/segdense/Cargo.toml +++ b/navi/segdense/Cargo.toml @@ -3,9 +3,9 @@ name = "segdense" version = "0.1.0" edition = "2021" -# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html [dependencies] +env_logger = "0.10.0" serde = { version = "1.0.104", features = ["derive"] } serde_json = "1.0.48" log = "0.4.17" diff --git a/navi/segdense/src/error.rs b/navi/segdense/src/error.rs index dcf5122b8..bb2a9245f 100644 --- a/navi/segdense/src/error.rs +++ b/navi/segdense/src/error.rs @@ -5,13 +5,13 @@ use std::fmt::Display; */ #[derive(Debug)] pub enum SegDenseError { - IoError(std::io::Error), - Json(serde_json::Error), - JsonMissingRoot, - JsonMissingObject, - JsonMissingArray, - JsonArraySize, - JsonMissingInputFeature, + IoError(std::io::Error), + Json(serde_json::Error), + JsonMissingRoot, + JsonMissingObject, + JsonMissingArray, + JsonArraySize, + JsonMissingInputFeature, } impl Display for SegDenseError { @@ -25,19 +25,18 @@ impl Display for SegDenseError { SegDenseError::JsonArraySize => write!(f, "SegDense JSON: Array size not as expected!"), SegDenseError::JsonMissingInputFeature => write!(f, "SegDense JSON: Missing input feature!"), } - } } impl std::error::Error for SegDenseError {} impl From for SegDenseError { - fn from(err: std::io::Error) -> Self { - SegDenseError::IoError(err) - } + fn from(err: std::io::Error) -> Self { + SegDenseError::IoError(err) + } } impl From for SegDenseError { - fn from(err: serde_json::Error) -> Self { - SegDenseError::Json(err) - } + fn from(err: serde_json::Error) -> Self { + SegDenseError::Json(err) + } } diff --git a/navi/segdense/src/lib.rs b/navi/segdense/src/lib.rs index 476411702..f9930da64 100644 --- a/navi/segdense/src/lib.rs +++ b/navi/segdense/src/lib.rs @@ -1,4 +1,4 @@ pub mod error; -pub mod segdense_transform_spec_home_recap_2022; pub mod mapper; -pub mod util; \ No newline at end of file +pub mod segdense_transform_spec_home_recap_2022; +pub mod util; diff --git a/navi/segdense/src/main.rs b/navi/segdense/src/main.rs index 1515df101..d8f7f8bc4 100644 --- a/navi/segdense/src/main.rs +++ b/navi/segdense/src/main.rs @@ -5,19 +5,18 @@ use segdense::error::SegDenseError; use segdense::util; fn main() -> Result<(), SegDenseError> { - env_logger::init(); - let args: Vec = env::args().collect(); - - let schema_file_name: &str = if args.len() == 1 { - "json/compact.json" - } else { - &args[1] - }; + env_logger::init(); + let args: Vec = env::args().collect(); - let json_str = fs::read_to_string(schema_file_name)?; + let schema_file_name: &str = if args.len() == 1 { + "json/compact.json" + } else { + &args[1] + }; - util::safe_load_config(&json_str)?; + let json_str = fs::read_to_string(schema_file_name)?; - Ok(()) + util::safe_load_config(&json_str)?; + + Ok(()) } - diff --git a/navi/segdense/src/mapper.rs b/navi/segdense/src/mapper.rs index f640f2aeb..f5a1d6532 100644 --- a/navi/segdense/src/mapper.rs +++ b/navi/segdense/src/mapper.rs @@ -19,13 +19,13 @@ pub struct FeatureMapper { impl FeatureMapper { pub fn new() -> FeatureMapper { FeatureMapper { - map: HashMap::new() + map: HashMap::new(), } } } pub trait MapWriter { - fn set(&mut self, feature_id: i64, info: FeatureInfo); + fn set(&mut self, feature_id: i64, info: FeatureInfo); } pub trait MapReader { diff --git a/navi/segdense/src/segdense_transform_spec_home_recap_2022.rs b/navi/segdense/src/segdense_transform_spec_home_recap_2022.rs index a3b3513f8..ff6d3ae17 100644 --- a/navi/segdense/src/segdense_transform_spec_home_recap_2022.rs +++ b/navi/segdense/src/segdense_transform_spec_home_recap_2022.rs @@ -164,7 +164,6 @@ pub struct ComplexFeatureTypeTransformSpec { pub tensor_shape: Vec, } - #[derive(Default, Debug, Clone, PartialEq, Serialize, Deserialize)] #[serde(rename_all = "camelCase")] pub struct InputFeatureMapRecord { diff --git a/navi/segdense/src/util.rs b/navi/segdense/src/util.rs index 406a7a281..c3bd656e5 100644 --- a/navi/segdense/src/util.rs +++ b/navi/segdense/src/util.rs @@ -1,10 +1,10 @@ +use log::debug; use std::fs; -use log::{debug}; -use serde_json::{Value, Map}; +use serde_json::{Map, Value}; use crate::error::SegDenseError; -use crate::mapper::{FeatureMapper, FeatureInfo, MapWriter}; +use crate::mapper::{FeatureInfo, FeatureMapper, MapWriter}; use crate::segdense_transform_spec_home_recap_2022::{self as seg_dense, InputFeature}; pub fn load_config(file_name: &str) -> seg_dense::Root { @@ -42,15 +42,8 @@ pub fn safe_load_config(json_str: &str) -> Result load_from_parsed_config(root) } -pub fn load_from_parsed_config_ref(root: &seg_dense::Root) -> FeatureMapper { - load_from_parsed_config(root.clone()).unwrap_or_else( - |error| panic!("Error loading all_config.json - {}", error)) -} - // Perf note : make 'root' un-owned -pub fn load_from_parsed_config(root: seg_dense::Root) -> - Result { - +pub fn load_from_parsed_config(root: seg_dense::Root) -> Result { let v = root.input_features_map; // Do error check @@ -84,7 +77,7 @@ pub fn load_from_parsed_config(root: seg_dense::Root) -> Some(info) => { debug!("{:?}", info); fm.set(feature_id, info) - }, + } None => (), } } @@ -92,19 +85,22 @@ pub fn load_from_parsed_config(root: seg_dense::Root) -> Ok(fm) } #[allow(dead_code)] -fn add_feature_info_to_mapper(feature_mapper: &mut FeatureMapper, input_features: &Vec) { +fn add_feature_info_to_mapper( + feature_mapper: &mut FeatureMapper, + input_features: &Vec, +) { for input_feature in input_features.iter() { - let feature_id = input_feature.feature_id; - let feature_info = to_feature_info(input_feature); - - match feature_info { - Some(info) => { - debug!("{:?}", info); - feature_mapper.set(feature_id, info) - }, - None => (), + let feature_id = input_feature.feature_id; + let feature_info = to_feature_info(input_feature); + + match feature_info { + Some(info) => { + debug!("{:?}", info); + feature_mapper.set(feature_id, info) } + None => (), } + } } pub fn to_feature_info(input_feature: &seg_dense::InputFeature) -> Option { @@ -137,7 +133,7 @@ pub fn to_feature_info(input_feature: &seg_dense::InputFeature) -> Option 0, 3 => 2, _ => -1, - } + }, }; if input_feature.index < 0 { @@ -154,4 +150,3 @@ pub fn to_feature_info(input_feature: &seg_dense::InputFeature) -> Option MemcachedClient} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.representation_manager.config.ClientConfig +import com.twitter.representation_manager.config.DisabledInMemoryCacheParams +import com.twitter.representation_manager.config.EnabledInMemoryCacheParams +import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.LocaleEntityId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.strato.thrift.ScroogeConvImplicits._ + +/** + * This is the class that offers features to build readable stores for a given + * SimClustersEmbeddingView (i.e. embeddingType and modelVersion). It applies ClientConfig + * for a particular service and build ReadableStores which implement that config. + */ +class StoreBuilder( + clientConfig: ClientConfig, + stratoClient: StratoClient, + memCachedClient: MemcachedClient, + globalStats: StatsReceiver, +) { + private val stats = + globalStats.scope("representation_manager_client").scope(this.getClass.getSimpleName) + + // Column consts + private val ColPathPrefix = "recommendations/representation_manager/" + private val SimclustersTweetColPath = ColPathPrefix + "simClustersEmbedding.Tweet" + private val SimclustersUserColPath = ColPathPrefix + "simClustersEmbedding.User" + private val SimclustersTopicIdColPath = ColPathPrefix + "simClustersEmbedding.TopicId" + private val SimclustersLocaleEntityIdColPath = + ColPathPrefix + "simClustersEmbedding.LocaleEntityId" + + def buildSimclustersTweetEmbeddingStore( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[Long, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersTweetColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + + addCacheLayer(rawStore, embeddingColumnView) + } + + def buildSimclustersUserEmbeddingStore( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[Long, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersUserColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + + addCacheLayer(rawStore, embeddingColumnView) + } + + def buildSimclustersTopicIdEmbeddingStore( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[TopicId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersTopicIdColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + + addCacheLayer(rawStore, embeddingColumnView) + } + + def buildSimclustersLocaleEntityIdEmbeddingStore( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[LocaleEntityId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[LocaleEntityId, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersLocaleEntityIdColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + + addCacheLayer(rawStore, embeddingColumnView) + } + + def buildSimclustersTweetEmbeddingStoreWithEmbeddingIdAsKey( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersTweetColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.TweetId(tweetId)) => + tweetId + } + + addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView) + } + + def buildSimclustersUserEmbeddingStoreWithEmbeddingIdAsKey( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersUserColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) => + userId + } + + addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView) + } + + def buildSimclustersTopicEmbeddingStoreWithEmbeddingIdAsKey( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersTopicIdColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) => + topicId + } + + addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView) + } + + def buildSimclustersTopicIdEmbeddingStoreWithEmbeddingIdAsKey( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersTopicIdColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) => + topicId + } + + addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView) + } + + def buildSimclustersLocaleEntityIdEmbeddingStoreWithEmbeddingIdAsKey( + embeddingColumnView: SimClustersEmbeddingView + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val rawStore = StratoFetchableStore + .withView[LocaleEntityId, SimClustersEmbeddingView, ThriftSimClustersEmbedding]( + stratoClient, + SimclustersLocaleEntityIdColPath, + embeddingColumnView) + .mapValues(SimClustersEmbedding(_)) + val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.LocaleEntityId(localeEntityId)) => + localeEntityId + } + + addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView) + } + + private def addCacheLayer[K]( + rawStore: ReadableStore[K, SimClustersEmbedding], + embeddingColumnView: SimClustersEmbeddingView, + ): ReadableStore[K, SimClustersEmbedding] = { + // Add in-memory caching based on ClientConfig + val inMemCacheParams = clientConfig.inMemoryCacheConfig + .getCacheSetup(embeddingColumnView.embeddingType, embeddingColumnView.modelVersion) + + val statsPerStore = stats + .scope(embeddingColumnView.embeddingType.name).scope(embeddingColumnView.modelVersion.name) + + inMemCacheParams match { + case DisabledInMemoryCacheParams => + ObservedReadableStore( + store = rawStore + )(statsPerStore) + case EnabledInMemoryCacheParams(ttl, maxKeys, cacheName) => + ObservedCachedReadableStore.from[K, SimClustersEmbedding]( + rawStore, + ttl = ttl, + maxKeys = maxKeys, + cacheName = cacheName, + windowSize = 10000L + )(statsPerStore) + } + } + +} diff --git a/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/BUILD b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/BUILD new file mode 100644 index 000000000..8418563d5 --- /dev/null +++ b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/BUILD @@ -0,0 +1,12 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-thrift-client", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/common", + "representation-manager/server/src/main/thrift:thrift-scala", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "strato/src/main/scala/com/twitter/strato/client", + ], +) diff --git a/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/ClientConfig.scala b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/ClientConfig.scala new file mode 100644 index 000000000..9ae0c49e7 --- /dev/null +++ b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/ClientConfig.scala @@ -0,0 +1,25 @@ +package com.twitter.representation_manager.config + +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.ModelVersion + +/* + * This is RMS client config class. + * We only support setting up in memory cache params for now, but we expect to enable other + * customisations in the near future e.g. request timeout + * + * -------------------------------------------- + * PLEASE NOTE: + * Having in-memory cache is not necessarily a free performance win, anyone considering it should + * investigate rather than blindly enabling it + * */ +class ClientConfig(inMemCacheParamsOverrides: Map[ + (EmbeddingType, ModelVersion), + InMemoryCacheParams +] = Map.empty) { + // In memory cache config per embedding + val inMemCacheParams = DefaultInMemoryCacheConfig.cacheParamsMap ++ inMemCacheParamsOverrides + val inMemoryCacheConfig = new InMemoryCacheConfig(inMemCacheParams) +} + +object DefaultClientConfig extends ClientConfig diff --git a/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/InMemoryCacheConfig.scala b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/InMemoryCacheConfig.scala new file mode 100644 index 000000000..eab569b51 --- /dev/null +++ b/representation-manager/client/src/main/scala/com/twitter/representation_manager/config/InMemoryCacheConfig.scala @@ -0,0 +1,53 @@ +package com.twitter.representation_manager.config + +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.util.Duration + +/* + * -------------------------------------------- + * PLEASE NOTE: + * Having in-memory cache is not necessarily a free performance win, anyone considering it should + * investigate rather than blindly enabling it + * -------------------------------------------- + * */ + +sealed trait InMemoryCacheParams + +/* + * This holds params that is required to set up a in-mem cache for a single embedding store + */ +case class EnabledInMemoryCacheParams( + ttl: Duration, + maxKeys: Int, + cacheName: String) + extends InMemoryCacheParams +object DisabledInMemoryCacheParams extends InMemoryCacheParams + +/* + * This is the class for the in-memory cache config. Client could pass in their own cacheParamsMap to + * create a new InMemoryCacheConfig instead of using the DefaultInMemoryCacheConfig object below + * */ +class InMemoryCacheConfig( + cacheParamsMap: Map[ + (EmbeddingType, ModelVersion), + InMemoryCacheParams + ] = Map.empty) { + + def getCacheSetup( + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): InMemoryCacheParams = { + // When requested embedding type doesn't exist, we return DisabledInMemoryCacheParams + cacheParamsMap.getOrElse((embeddingType, modelVersion), DisabledInMemoryCacheParams) + } +} + +/* + * Default config for the in-memory cache + * Clients can directly import and use this one if they don't want to set up a customised config + * */ +object DefaultInMemoryCacheConfig extends InMemoryCacheConfig { + // set default to no in-memory caching + val cacheParamsMap = Map.empty +} diff --git a/representation-manager/server/BUILD b/representation-manager/server/BUILD new file mode 100644 index 000000000..427fc1d3b --- /dev/null +++ b/representation-manager/server/BUILD @@ -0,0 +1,21 @@ +jvm_binary( + name = "bin", + basename = "representation-manager", + main = "com.twitter.representation_manager.RepresentationManagerFedServerMain", + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-logback/src/main/scala", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "representation-manager/server/src/main/resources", + "representation-manager/server/src/main/scala/com/twitter/representation_manager", + "twitter-server/logback-classic/src/main/scala", + ], +) + +# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app +jvm_app( + name = "representation-manager-app", + archive = "zip", + binary = ":bin", +) diff --git a/representation-manager/server/src/main/resources/BUILD b/representation-manager/server/src/main/resources/BUILD new file mode 100644 index 000000000..b3a752276 --- /dev/null +++ b/representation-manager/server/src/main/resources/BUILD @@ -0,0 +1,7 @@ +resources( + sources = [ + "*.xml", + "config/*.yml", + ], + tags = ["bazel-compatible"], +) diff --git a/representation-manager/server/src/main/resources/config/decider.yml b/representation-manager/server/src/main/resources/config/decider.yml new file mode 100644 index 000000000..e75ebf89d --- /dev/null +++ b/representation-manager/server/src/main/resources/config/decider.yml @@ -0,0 +1,219 @@ +# ---------- traffic percentage by embedding type and model version ---------- +# Decider strings are build dynamically following the rule in there +# i.e. s"enable_${embeddingType.name}_${modelVersion.name}" +# Hence this should be updated accordingly if usage is changed in the embedding stores + +# Tweet embeddings +"enable_LogFavBasedTweet_Model20m145k2020": + comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavBasedTweet - Model20m145k2020. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedTweet_Model20m145kUpdated": + comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavBasedTweet - Model20m145kUpdated. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavLongestL2EmbeddingTweet_Model20m145k2020": + comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavLongestL2EmbeddingTweet - Model20m145k2020. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavLongestL2EmbeddingTweet_Model20m145kUpdated": + comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavLongestL2EmbeddingTweet - Model20m145kUpdated. 0 means return EMPTY for all requests." + default_availability: 10000 + +# Topic embeddings +"enable_FavTfgTopic_Model20m145k2020": + comment: "Enable the read traffic to FavTfgTopic - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedKgoApeTopic_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedKgoApeTopic - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +# User embeddings - KnownFor +"enable_FavBasedProducer_Model20m145kUpdated": + comment: "Enable the read traffic to FavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FavBasedProducer_Model20m145k2020": + comment: "Enable the read traffic to FavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FollowBasedProducer_Model20m145k2020": + comment: "Enable the read traffic to FollowBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_AggregatableFavBasedProducer_Model20m145kUpdated": + comment: "Enable the read traffic to AggregatableFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_AggregatableFavBasedProducer_Model20m145k2020": + comment: "Enable the read traffic to AggregatableFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_AggregatableLogFavBasedProducer_Model20m145kUpdated": + comment: "Enable the read traffic to AggregatableLogFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_AggregatableLogFavBasedProducer_Model20m145k2020": + comment: "Enable the read traffic to AggregatableLogFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +enable_RelaxedAggregatableLogFavBasedProducer_Model20m145kUpdated: + comment: "Enable the read traffic to RelaxedAggregatableLogFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +enable_RelaxedAggregatableLogFavBasedProducer_Model20m145k2020: + comment: "Enable the read traffic to RelaxedAggregatableLogFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +# User embeddings - InterestedIn +"enable_LogFavBasedUserInterestedInFromAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedInFromAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FollowBasedUserInterestedInFromAPE_Model20m145k2020": + comment: "Enable the read traffic to FollowBasedUserInterestedInFromAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FavBasedUserInterestedIn_Model20m145kUpdated": + comment: "Enable the read traffic to FavBasedUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FavBasedUserInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to FavBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FollowBasedUserInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to FollowBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FavBasedUserInterestedInFromPE_Model20m145kUpdated": + comment: "Enable the read traffic to FavBasedUserInterestedInFromPE - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FilteredUserInterestedIn_Model20m145kUpdated": + comment: "Enable the read traffic to FilteredUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FilteredUserInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to FilteredUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_FilteredUserInterestedInFromPE_Model20m145kUpdated": + comment: "Enable the read traffic to FilteredUserInterestedInFromPE - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_UnfilteredUserInterestedIn_Model20m145kUpdated": + comment: "Enable the read traffic to UnfilteredUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_UnfilteredUserInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to UnfilteredUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_UserNextInterestedIn_Model20m145k2020": + comment: "Enable the read traffic to UserNextInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedAverageAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +"enable_LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE_Model20m145k2020": + comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests." + default_availability: 10000 + +# ---------- load shedding by caller id ---------- +# To create a new decider, add here with the same format and caller's details : +# "representation-manager_load_shed_by_caller_id_twtr:{{role}}:{{name}}:{{environment}}:{{cluster}}" +# All the deciders below are generated by this script: +# ./strato/bin/fed deciders representation-manager --service-role=representation-manager --service-name=representation-manager +# If you need to run the script and paste the output, add ONLY the prod deciders here. +"representation-manager_load_shed_by_caller_id_all": + comment: "Reject all traffic from caller id: all" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:cr-mixer:cr-mixer:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:cr-mixer:cr-mixer:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:cr-mixer:cr-mixer:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:cr-mixer:cr-mixer:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-1:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-1:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-1:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-1:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-3:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-3:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-3:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-3:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-4:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-4:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-4:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-4:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann:prod:pdxa" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoapi:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoapi:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:atla" + default_availability: 0 + +"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:pdxa" + default_availability: 0 + +# ---------- Dark Traffic Proxy ---------- +representation-manager_forward_dark_traffic: + comment: "Defines the percentage of traffic to forward to diffy-proxy. Set to 0 to disable dark traffic forwarding" + default_availability: 0 diff --git a/representation-manager/server/src/main/resources/logback.xml b/representation-manager/server/src/main/resources/logback.xml new file mode 100644 index 000000000..47b3ed16d --- /dev/null +++ b/representation-manager/server/src/main/resources/logback.xml @@ -0,0 +1,165 @@ + + + + + + + + + + + + + + + + + true + + + + + + + + + + + ${log.service.output} + + + ${log.service.output}.%d.gz + + 3GB + + 21 + true + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + ${log.access.output} + + + ${log.access.output}.%d.gz + + 100MB + + 7 + true + + + ${DEFAULT_ACCESS_PATTERN}%n + + + + + + true + ${log.lens.category} + ${log.lens.index} + ${log.lens.tag}/service + + %msg + + + + + + true + ${log.lens.category} + ${log.lens.index} + ${log.lens.tag}/access + + %msg + + + + + + allow_listed_pipeline_executions.log + + + allow_listed_pipeline_executions.log.%d.gz + + 100MB + + 7 + true + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + + + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/BUILD new file mode 100644 index 000000000..d8ca301f6 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/BUILD @@ -0,0 +1,13 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-thrift-client", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/RepresentationManagerFedServer.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/RepresentationManagerFedServer.scala new file mode 100644 index 000000000..5bc820bb4 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/RepresentationManagerFedServer.scala @@ -0,0 +1,40 @@ +package com.twitter.representation_manager + +import com.google.inject.Module +import com.twitter.inject.thrift.modules.ThriftClientIdModule +import com.twitter.representation_manager.columns.topic.LocaleEntityIdSimClustersEmbeddingCol +import com.twitter.representation_manager.columns.topic.TopicIdSimClustersEmbeddingCol +import com.twitter.representation_manager.columns.tweet.TweetSimClustersEmbeddingCol +import com.twitter.representation_manager.columns.user.UserSimClustersEmbeddingCol +import com.twitter.representation_manager.modules.CacheModule +import com.twitter.representation_manager.modules.InterestsThriftClientModule +import com.twitter.representation_manager.modules.LegacyRMSConfigModule +import com.twitter.representation_manager.modules.StoreModule +import com.twitter.representation_manager.modules.TimerModule +import com.twitter.representation_manager.modules.UttClientModule +import com.twitter.strato.fed._ +import com.twitter.strato.fed.server._ + +object RepresentationManagerFedServerMain extends RepresentationManagerFedServer + +trait RepresentationManagerFedServer extends StratoFedServer { + override def dest: String = "/s/representation-manager/representation-manager" + override val modules: Seq[Module] = + Seq( + CacheModule, + InterestsThriftClientModule, + LegacyRMSConfigModule, + StoreModule, + ThriftClientIdModule, + TimerModule, + UttClientModule + ) + + override def columns: Seq[Class[_ <: StratoFed.Column]] = + Seq( + classOf[TweetSimClustersEmbeddingCol], + classOf[UserSimClustersEmbeddingCol], + classOf[TopicIdSimClustersEmbeddingCol], + classOf[LocaleEntityIdSimClustersEmbeddingCol] + ) +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/BUILD new file mode 100644 index 000000000..6ebd77ef8 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/BUILD @@ -0,0 +1,9 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/ColumnConfigBase.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/ColumnConfigBase.scala new file mode 100644 index 000000000..143ccdc4c --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/ColumnConfigBase.scala @@ -0,0 +1,26 @@ +package com.twitter.representation_manager.columns + +import com.twitter.strato.access.Access.LdapGroup +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.FromColumns +import com.twitter.strato.config.Has +import com.twitter.strato.config.Prefix +import com.twitter.strato.config.ServiceIdentifierPattern + +object ColumnConfigBase { + + /****************** Internal permissions *******************/ + val recosPermissions: Seq[com.twitter.strato.config.Policy] = Seq() + + /****************** External permissions *******************/ + // This is used to grant limited access to members outside of RP team. + val externalPermissions: Seq[com.twitter.strato.config.Policy] = Seq() + + val contactInfo: ContactInfo = ContactInfo( + description = "Please contact Relevance Platform for more details", + contactEmail = "no-reply@twitter.com", + ldapGroup = "ldap", + jiraProject = "JIRA", + links = Seq("http://go/rms-runbook") + ) +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/BUILD new file mode 100644 index 000000000..26022ebe5 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-core/src/main/scala", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/modules", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/store", + "representation-manager/server/src/main/thrift:thrift-scala", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/LocaleEntityIdSimClustersEmbeddingCol.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/LocaleEntityIdSimClustersEmbeddingCol.scala new file mode 100644 index 000000000..7b7952300 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/LocaleEntityIdSimClustersEmbeddingCol.scala @@ -0,0 +1,77 @@ +package com.twitter.representation_manager.columns.topic + +import com.twitter.representation_manager.columns.ColumnConfigBase +import com.twitter.representation_manager.store.TopicSimClustersEmbeddingStore +import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.LocaleEntityId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.AnyOf +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.FromColumns +import com.twitter.strato.config.Policy +import com.twitter.strato.config.Prefix +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class LocaleEntityIdSimClustersEmbeddingCol @Inject() ( + embeddingStore: TopicSimClustersEmbeddingStore) + extends StratoFed.Column( + "recommendations/representation_manager/simClustersEmbedding.LocaleEntityId") + with StratoFed.Fetch.Stitch { + + private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] = + StitchOfReadableStore(embeddingStore.topicSimClustersEmbeddingStore.mapValues(_.toThrift)) + + val colPermissions: Seq[com.twitter.strato.config.Policy] = + ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns( + Set( + Prefix("ml/featureStore/simClusters"), + )) + + override val policy: Policy = AnyOf({ + colPermissions + }) + + override type Key = LocaleEntityId + override type View = SimClustersEmbeddingView + override type Value = SimClustersEmbedding + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[LocaleEntityId] + override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView] + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding] + + override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText( + "The Topic SimClusters Embedding Endpoint in Representation Management Service with LocaleEntityId." + + " TDD: http://go/rms-tdd")) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + val embeddingId = SimClustersEmbeddingId( + view.embeddingType, + view.modelVersion, + InternalId.LocaleEntityId(key) + ) + + storeStitch(embeddingId) + .map(embedding => found(embedding)) + .handle { + case stitch.NotFound => missing + } + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/TopicIdSimClustersEmbeddingCol.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/TopicIdSimClustersEmbeddingCol.scala new file mode 100644 index 000000000..4afddbb4c --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic/TopicIdSimClustersEmbeddingCol.scala @@ -0,0 +1,74 @@ +package com.twitter.representation_manager.columns.topic + +import com.twitter.representation_manager.columns.ColumnConfigBase +import com.twitter.representation_manager.store.TopicSimClustersEmbeddingStore +import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.AnyOf +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.FromColumns +import com.twitter.strato.config.Policy +import com.twitter.strato.config.Prefix +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class TopicIdSimClustersEmbeddingCol @Inject() (embeddingStore: TopicSimClustersEmbeddingStore) + extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.TopicId") + with StratoFed.Fetch.Stitch { + + private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] = + StitchOfReadableStore(embeddingStore.topicSimClustersEmbeddingStore.mapValues(_.toThrift)) + + val colPermissions: Seq[com.twitter.strato.config.Policy] = + ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns( + Set( + Prefix("ml/featureStore/simClusters"), + )) + + override val policy: Policy = AnyOf({ + colPermissions + }) + + override type Key = TopicId + override type View = SimClustersEmbeddingView + override type Value = SimClustersEmbedding + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[TopicId] + override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView] + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding] + + override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some(PlainText( + "The Topic SimClusters Embedding Endpoint in Representation Management Service with TopicId." + + " TDD: http://go/rms-tdd")) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + val embeddingId = SimClustersEmbeddingId( + view.embeddingType, + view.modelVersion, + InternalId.TopicId(key) + ) + + storeStitch(embeddingId) + .map(embedding => found(embedding)) + .handle { + case stitch.NotFound => missing + } + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/BUILD new file mode 100644 index 000000000..26022ebe5 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-core/src/main/scala", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/modules", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/store", + "representation-manager/server/src/main/thrift:thrift-scala", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/TweetSimClustersEmbeddingCol.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/TweetSimClustersEmbeddingCol.scala new file mode 100644 index 000000000..15cd4247c --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet/TweetSimClustersEmbeddingCol.scala @@ -0,0 +1,73 @@ +package com.twitter.representation_manager.columns.tweet + +import com.twitter.representation_manager.columns.ColumnConfigBase +import com.twitter.representation_manager.store.TweetSimClustersEmbeddingStore +import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.AnyOf +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.FromColumns +import com.twitter.strato.config.Policy +import com.twitter.strato.config.Prefix +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class TweetSimClustersEmbeddingCol @Inject() (embeddingStore: TweetSimClustersEmbeddingStore) + extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.Tweet") + with StratoFed.Fetch.Stitch { + + private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] = + StitchOfReadableStore(embeddingStore.tweetSimClustersEmbeddingStore.mapValues(_.toThrift)) + + val colPermissions: Seq[com.twitter.strato.config.Policy] = + ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns( + Set( + Prefix("ml/featureStore/simClusters"), + )) + + override val policy: Policy = AnyOf({ + colPermissions + }) + + override type Key = Long // TweetId + override type View = SimClustersEmbeddingView + override type Value = SimClustersEmbedding + + override val keyConv: Conv[Key] = Conv.long + override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView] + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding] + + override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText("The Tweet SimClusters Embedding Endpoint in Representation Management Service." + + " TDD: http://go/rms-tdd")) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + val embeddingId = SimClustersEmbeddingId( + view.embeddingType, + view.modelVersion, + InternalId.TweetId(key) + ) + + storeStitch(embeddingId) + .map(embedding => found(embedding)) + .handle { + case stitch.NotFound => missing + } + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/BUILD new file mode 100644 index 000000000..26022ebe5 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-core/src/main/scala", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/columns", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/modules", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/store", + "representation-manager/server/src/main/thrift:thrift-scala", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/UserSimClustersEmbeddingCol.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/UserSimClustersEmbeddingCol.scala new file mode 100644 index 000000000..ebcf22a1d --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user/UserSimClustersEmbeddingCol.scala @@ -0,0 +1,73 @@ +package com.twitter.representation_manager.columns.user + +import com.twitter.representation_manager.columns.ColumnConfigBase +import com.twitter.representation_manager.store.UserSimClustersEmbeddingStore +import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.AnyOf +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.FromColumns +import com.twitter.strato.config.Policy +import com.twitter.strato.config.Prefix +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class UserSimClustersEmbeddingCol @Inject() (embeddingStore: UserSimClustersEmbeddingStore) + extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.User") + with StratoFed.Fetch.Stitch { + + private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] = + StitchOfReadableStore(embeddingStore.userSimClustersEmbeddingStore.mapValues(_.toThrift)) + + val colPermissions: Seq[com.twitter.strato.config.Policy] = + ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns( + Set( + Prefix("ml/featureStore/simClusters"), + )) + + override val policy: Policy = AnyOf({ + colPermissions + }) + + override type Key = Long // UserId + override type View = SimClustersEmbeddingView + override type Value = SimClustersEmbedding + + override val keyConv: Conv[Key] = Conv.long + override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView] + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding] + + override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText("The User SimClusters Embedding Endpoint in Representation Management Service." + + " TDD: http://go/rms-tdd")) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + val embeddingId = SimClustersEmbeddingId( + view.embeddingType, + view.modelVersion, + InternalId.UserId(key) + ) + + storeStitch(embeddingId) + .map(embedding => found(embedding)) + .handle { + case stitch.NotFound => missing + } + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/BUILD new file mode 100644 index 000000000..62b8f5dd2 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/BUILD @@ -0,0 +1,13 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "decider/src/main/scala", + "finagle/finagle-memcached", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection", + "src/scala/com/twitter/simclusters_v2/common", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/MemCacheConfig.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/MemCacheConfig.scala new file mode 100644 index 000000000..4741edb2d --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/MemCacheConfig.scala @@ -0,0 +1,153 @@ +package com.twitter.representation_manager.common + +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hashing.KeyHasher +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.relevance_platform.common.injection.LZ4Injection +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.common.SimClustersEmbeddingIdCacheKeyBuilder +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storehaus.ReadableStore +import com.twitter.util.Duration + +/* + * NOTE - ALL the cache configs here are just placeholders, NONE of them is used anyweher in RMS yet + * */ +sealed trait MemCacheParams +sealed trait MemCacheConfig + +/* + * This holds params that is required to set up a memcache cache for a single embedding store + * */ +case class EnabledMemCacheParams(ttl: Duration) extends MemCacheParams +object DisabledMemCacheParams extends MemCacheParams + +/* + * We use this MemcacheConfig as the single source to set up the memcache for all RMS use cases + * NO OVERRIDE FROM CLIENT + * */ +object MemCacheConfig { + val keyHasher: KeyHasher = KeyHasher.FNV1A_64 + val hashKeyPrefix: String = "RMS" + val simclustersEmbeddingCacheKeyBuilder = + SimClustersEmbeddingIdCacheKeyBuilder(keyHasher.hashKey, hashKeyPrefix) + + val cacheParamsMap: Map[ + (EmbeddingType, ModelVersion), + MemCacheParams + ] = Map( + // Tweet Embeddings + (LogFavBasedTweet, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 10.minutes), + (LogFavBasedTweet, Model20m145k2020) -> EnabledMemCacheParams(ttl = 10.minutes), + (LogFavLongestL2EmbeddingTweet, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 10.minutes), + (LogFavLongestL2EmbeddingTweet, Model20m145k2020) -> EnabledMemCacheParams(ttl = 10.minutes), + // User - KnownFor Embeddings + (FavBasedProducer, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (FavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FollowBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (AggregatableLogFavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (RelaxedAggregatableLogFavBasedProducer, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = + 12.hours), + (RelaxedAggregatableLogFavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = + 12.hours), + // User - InterestedIn Embeddings + (LogFavBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FollowBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FavBasedUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (FavBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FollowBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (LogFavBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FavBasedUserInterestedInFromPE, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (FilteredUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (FilteredUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (FilteredUserInterestedInFromPE, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (UnfilteredUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours), + (UnfilteredUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (UserNextInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = + 30.minutes), //embedding is updated every 2 hours, keeping it lower to avoid staleness + ( + LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ( + LogFavBasedUserInterestedAverageAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ( + LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ( + LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ( + LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ( + LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + // Topic Embeddings + (FavTfgTopic, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + (LogFavBasedKgoApeTopic, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours), + ) + + def getCacheSetup( + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): MemCacheParams = { + // When requested (embeddingType, modelVersion) doesn't exist, we return DisabledMemCacheParams + cacheParamsMap.getOrElse((embeddingType, modelVersion), DisabledMemCacheParams) + } + + def getCacheKeyPrefix(embeddingType: EmbeddingType, modelVersion: ModelVersion) = + s"${embeddingType.value}_${modelVersion.value}_" + + def getStatsName(embeddingType: EmbeddingType, modelVersion: ModelVersion) = + s"${embeddingType.name}_${modelVersion.name}_mem_cache" + + /** + * Build a ReadableStore based on MemCacheConfig. + * + * If memcache is disabled, it will return a normal readable store wrapper of the rawStore, + * with SimClustersEmbedding as value; + * If memcache is enabled, it will return a ObservedMemcachedReadableStore wrapper of the rawStore, + * with memcache set up according to the EnabledMemCacheParams + * */ + def buildMemCacheStoreForSimClustersEmbedding( + rawStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding], + cacheClient: Client, + embeddingType: EmbeddingType, + modelVersion: ModelVersion, + stats: StatsReceiver + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val cacheParams = getCacheSetup(embeddingType, modelVersion) + val store = cacheParams match { + case DisabledMemCacheParams => rawStore + case EnabledMemCacheParams(ttl) => + val memCacheKeyPrefix = MemCacheConfig.getCacheKeyPrefix( + embeddingType, + modelVersion + ) + val statsName = MemCacheConfig.getStatsName( + embeddingType, + modelVersion + ) + ObservedMemcachedReadableStore.fromCacheClient( + backingStore = rawStore, + cacheClient = cacheClient, + ttl = ttl + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = stats.scope(statsName), + keyToString = { k => memCacheKeyPrefix + k.toString } + ) + } + store.mapValues(SimClustersEmbedding(_)) + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/RepresentationManagerDecider.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/RepresentationManagerDecider.scala new file mode 100644 index 000000000..97179e25f --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/common/RepresentationManagerDecider.scala @@ -0,0 +1,25 @@ +package com.twitter.representation_manager.common + +import com.twitter.decider.Decider +import com.twitter.decider.RandomRecipient +import com.twitter.decider.Recipient +import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing +import javax.inject.Inject + +case class RepresentationManagerDecider @Inject() (decider: Decider) { + + val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider) + + def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = { + decider.isAvailable(feature, recipient) + } + + /** + * When useRandomRecipient is set to false, the decider is either completely on or off. + * When useRandomRecipient is set to true, the decider is on for the specified % of traffic. + */ + def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = { + if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient)) + else isAvailable(feature, None) + } +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/BUILD new file mode 100644 index 000000000..d8bf04fc0 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/BUILD @@ -0,0 +1,25 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "content-recommender/server/src/main/scala/com/twitter/contentrecommender:representation-manager-deps", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/readablestore", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/common", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/store", + "src/scala/com/twitter/ml/api/embedding", + "src/scala/com/twitter/simclusters_v2/common", + "src/scala/com/twitter/simclusters_v2/score", + "src/scala/com/twitter/simclusters_v2/summingbird/stores", + "src/scala/com/twitter/storehaus_internal/manhattan", + "src/scala/com/twitter/storehaus_internal/util", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/socialgraph:thrift-scala", + "storage/clients/manhattan/client/src/main/scala", + "tweetypie/src/scala/com/twitter/tweetypie/util", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/LegacyRMS.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/LegacyRMS.scala new file mode 100644 index 000000000..378f33594 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/migration/LegacyRMS.scala @@ -0,0 +1,846 @@ +package com.twitter.representation_manager.migration + +import com.twitter.bijection.Injection +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.contentrecommender.store.ApeEntityEmbeddingStore +import com.twitter.contentrecommender.store.InterestsOptOutStore +import com.twitter.contentrecommender.store.SemanticCoreTopicSeedStore +import com.twitter.contentrecommender.twistly +import com.twitter.conversions.DurationOps._ +import com.twitter.decider.Decider +import com.twitter.escherbird.util.uttclient.CacheConfigV2 +import com.twitter.escherbird.util.uttclient.CachedUttClientV2 +import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2 +import com.twitter.escherbird.utt.strato.thriftscala.Environment +import com.twitter.finagle.ThriftMux +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.service.ReqRep +import com.twitter.finagle.service.ResponseClass +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.thrift.ClientId +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.frigate.common.util.SeqLongInjection +import com.twitter.hashing.KeyHasher +import com.twitter.hermit.store.common.DeciderableReadableStore +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.interests.thriftscala.InterestsThriftService +import com.twitter.relevance_platform.common.injection.LZ4Injection +import com.twitter.relevance_platform.common.readablestore.ReadableStoreWithTimeout +import com.twitter.representation_manager.common.RepresentationManagerDecider +import com.twitter.representation_manager.store.DeciderConstants +import com.twitter.representation_manager.store.DeciderKey +import com.twitter.simclusters_v2.common.ModelVersions +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.common.SimClustersEmbeddingIdCacheKeyBuilder +import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore +import com.twitter.simclusters_v2.summingbird.stores.PersistentTweetEmbeddingStore +import com.twitter.simclusters_v2.summingbird.stores.ProducerClusterEmbeddingReadableStores +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore +import com.twitter.simclusters_v2.thriftscala.ClustersUserIsInterestedIn +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145k2020 +import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145kUpdated +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.SimClustersMultiEmbedding +import com.twitter.simclusters_v2.thriftscala.SimClustersMultiEmbeddingId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus.ReadableStore +import com.twitter.storehaus_internal.manhattan.Athena +import com.twitter.storehaus_internal.manhattan.ManhattanRO +import com.twitter.storehaus_internal.manhattan.ManhattanROConfig +import com.twitter.storehaus_internal.util.ApplicationID +import com.twitter.storehaus_internal.util.DatasetName +import com.twitter.storehaus_internal.util.HDFSPath +import com.twitter.strato.client.Strato +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.strato.thrift.ScroogeConvImplicits._ +import com.twitter.tweetypie.util.UserId +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.Throw +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Named +import scala.reflect.ClassTag + +class LegacyRMS @Inject() ( + serviceIdentifier: ServiceIdentifier, + cacheClient: Client, + stats: StatsReceiver, + decider: Decider, + clientId: ClientId, + timer: Timer, + @Named("cacheHashKeyPrefix") val cacheHashKeyPrefix: String = "RMS", + @Named("useContentRecommenderConfiguration") val useContentRecommenderConfiguration: Boolean = + false) { + + private val mhMtlsParams: ManhattanKVClientMtlsParams = ManhattanKVClientMtlsParams( + serviceIdentifier) + private val rmsDecider = RepresentationManagerDecider(decider) + val keyHasher: KeyHasher = KeyHasher.FNV1A_64 + + private val embeddingCacheKeyBuilder = + SimClustersEmbeddingIdCacheKeyBuilder(keyHasher.hashKey, cacheHashKeyPrefix) + private val statsReceiver = stats.scope("representation_management") + + // Strato client, default timeout = 280ms + val stratoClient: StratoClient = + Strato.client + .withMutualTls(serviceIdentifier) + .build() + + // Builds ThriftMux client builder for Content-Recommender service + private def makeThriftClientBuilder( + requestTimeout: Duration + ): ThriftMux.Client = { + ThriftMux.client + .withClientId(clientId) + .withMutualTls(serviceIdentifier) + .withRequestTimeout(requestTimeout) + .withStatsReceiver(statsReceiver.scope("clnt")) + .withResponseClassifier { + case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable + } + } + + private def makeThriftClient[ThriftServiceType: ClassTag]( + dest: String, + label: String, + requestTimeout: Duration = 450.milliseconds + ): ThriftServiceType = { + makeThriftClientBuilder(requestTimeout) + .build[ThriftServiceType](dest, label) + } + + /** *** SimCluster Embedding Stores ******/ + implicit val simClustersEmbeddingIdInjection: Injection[SimClustersEmbeddingId, Array[Byte]] = + BinaryScalaCodec(SimClustersEmbeddingId) + implicit val simClustersEmbeddingInjection: Injection[ThriftSimClustersEmbedding, Array[Byte]] = + BinaryScalaCodec(ThriftSimClustersEmbedding) + implicit val simClustersMultiEmbeddingInjection: Injection[SimClustersMultiEmbedding, Array[ + Byte + ]] = + BinaryScalaCodec(SimClustersMultiEmbedding) + implicit val simClustersMultiEmbeddingIdInjection: Injection[SimClustersMultiEmbeddingId, Array[ + Byte + ]] = + BinaryScalaCodec(SimClustersMultiEmbeddingId) + + def getEmbeddingsDataset( + mhMtlsParams: ManhattanKVClientMtlsParams, + datasetName: String + ): ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding] = { + ManhattanRO.getReadableStoreWithMtls[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + ManhattanROConfig( + HDFSPath(""), // not needed + ApplicationID("content_recommender_athena"), + DatasetName(datasetName), // this should be correct + Athena + ), + mhMtlsParams + ) + } + + lazy val logFavBasedLongestL2Tweet20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .longestL2NormTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset, + statsReceiver, + maxLength = 10, + ).mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = rawStore, + cacheClient = cacheClient, + ttl = 15.minutes + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = + statsReceiver.scope("log_fav_based_longest_l2_tweet_embedding_20m145k2020_mem_cache"), + keyToString = { k => + s"scez_l2:${LogFavBasedTweet}_${ModelVersions.Model20M145K2020}_$k" + } + ) + + val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + memcachedStore + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + LogFavLongestL2EmbeddingTweet, + Model20m145k2020, + InternalId.TweetId(tweetId)) => + tweetId + } + .mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + inMemoryCacheStore, + ttl = 12.minute, + maxKeys = 1048575, + cacheName = "log_fav_based_longest_l2_tweet_embedding_20m145k2020_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_longest_l2_tweet_embedding_20m145k2020_store")) + } + + lazy val logFavBased20M145KUpdatedTweetEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .mostRecentTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset, + statsReceiver + ).mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = rawStore, + cacheClient = cacheClient, + ttl = 10.minutes + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("log_fav_based_tweet_embedding_mem_cache"), + keyToString = { k => + // SimClusters_embedding_LZ4/embeddingType_modelVersion_tweetId + s"scez:${LogFavBasedTweet}_${ModelVersions.Model20M145KUpdated}_$k" + } + ) + + val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + memcachedStore + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + LogFavBasedTweet, + Model20m145kUpdated, + InternalId.TweetId(tweetId)) => + tweetId + } + .mapValues(SimClustersEmbedding(_)) + } + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + inMemoryCacheStore, + ttl = 5.minute, + maxKeys = 1048575, // 200MB + cacheName = "log_fav_based_tweet_embedding_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_tweet_embedding_store")) + } + + lazy val logFavBased20M145K2020TweetEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .mostRecentTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset, + statsReceiver, + maxLength = 10, + ).mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = rawStore, + cacheClient = cacheClient, + ttl = 15.minutes + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("log_fav_based_tweet_embedding_20m145k2020_mem_cache"), + keyToString = { k => + // SimClusters_embedding_LZ4/embeddingType_modelVersion_tweetId + s"scez:${LogFavBasedTweet}_${ModelVersions.Model20M145K2020}_$k" + } + ) + + val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + memcachedStore + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + LogFavBasedTweet, + Model20m145k2020, + InternalId.TweetId(tweetId)) => + tweetId + } + .mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + inMemoryCacheStore, + ttl = 12.minute, + maxKeys = 16777215, + cacheName = "log_fav_based_tweet_embedding_20m145k2020_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_tweet_embedding_20m145k2020_store")) + } + + lazy val favBasedTfgTopicEmbedding2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val stratoStore = + StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/favBasedTFGTopic20M145K2020") + + val truncatedStore = stratoStore.mapValues { embedding => + SimClustersEmbedding(embedding, truncate = 50) + } + + ObservedCachedReadableStore.from( + ObservedReadableStore(truncatedStore)( + statsReceiver.scope("fav_tfg_topic_embedding_2020_cache_backing_store")), + ttl = 12.hours, + maxKeys = 262143, // 200MB + cacheName = "fav_tfg_topic_embedding_2020_cache", + windowSize = 10000L + )(statsReceiver.scope("fav_tfg_topic_embedding_2020_cache")) + } + + lazy val logFavBasedApe20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + ObservedReadableStore( + StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020") + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + AggregatableLogFavBasedProducer, + Model20m145k2020, + internalId) => + SimClustersEmbeddingId(AggregatableLogFavBasedProducer, Model20m145k2020, internalId) + } + .mapValues(embedding => SimClustersEmbedding(embedding, 50)) + )(statsReceiver.scope("aggregatable_producer_embeddings_by_logfav_score_2020")) + } + + val interestService: InterestsThriftService.MethodPerEndpoint = + makeThriftClient[InterestsThriftService.MethodPerEndpoint]( + "/s/interests-thrift-service/interests-thrift-service", + "interests_thrift_service" + ) + + val interestsOptOutStore: InterestsOptOutStore = InterestsOptOutStore(interestService) + + // Save 2 ^ 18 UTTs. Promising 100% cache rate + lazy val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143) + lazy val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2( + getTaxonomyConfig = defaultCacheConfigV2, + getUttTaxonomyConfig = defaultCacheConfigV2, + getLeafIds = defaultCacheConfigV2, + getLeafUttEntities = defaultCacheConfigV2 + ) + + // CachedUttClient to use StratoClient + lazy val cachedUttClientV2: CachedUttClientV2 = new CachedUttClientV2( + stratoClient = stratoClient, + env = Environment.Prod, + cacheConfigs = uttClientCacheConfigsV2, + statsReceiver = statsReceiver.scope("cached_utt_client") + ) + + lazy val semanticCoreTopicSeedStore: ReadableStore[ + SemanticCoreTopicSeedStore.Key, + Seq[UserId] + ] = { + /* + Up to 1000 Long seeds per topic/language = 62.5kb per topic/language (worst case) + Assume ~10k active topic/languages ~= 650MB (worst case) + */ + val underlying = new SemanticCoreTopicSeedStore(cachedUttClientV2, interestsOptOutStore)( + statsReceiver.scope("semantic_core_topic_seed_store")) + + val memcacheStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlying, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = SeqLongInjection, + statsReceiver = statsReceiver.scope("topic_producer_seed_store_mem_cache"), + keyToString = { k => s"tpss:${k.entityId}_${k.languageCode}" } + ) + + ObservedCachedReadableStore.from[SemanticCoreTopicSeedStore.Key, Seq[UserId]]( + store = memcacheStore, + ttl = 6.hours, + maxKeys = 20e3.toInt, + cacheName = "topic_producer_seed_store_cache", + windowSize = 5000 + )(statsReceiver.scope("topic_producer_seed_store_cache")) + } + + lazy val logFavBasedApeEntity20M145K2020EmbeddingStore: ApeEntityEmbeddingStore = { + val apeStore = logFavBasedApe20M145K2020EmbeddingStore.composeKeyMapping[UserId]({ id => + SimClustersEmbeddingId( + AggregatableLogFavBasedProducer, + Model20m145k2020, + InternalId.UserId(id)) + }) + + new ApeEntityEmbeddingStore( + semanticCoreSeedStore = semanticCoreTopicSeedStore, + aggregatableProducerEmbeddingStore = apeStore, + statsReceiver = statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_store")) + } + + lazy val logFavBasedApeEntity20M145K2020EmbeddingCachedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val truncatedStore = + logFavBasedApeEntity20M145K2020EmbeddingStore.mapValues(_.truncate(50).toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = truncatedStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + val inMemoryCachedStore = + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "log_fav_based_ape_entity_2020_embedding_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_cached_store")) + + DeciderableReadableStore( + inMemoryCachedStore, + rmsDecider.deciderGateBuilder.idGateWithHashing[SimClustersEmbeddingId]( + DeciderKey.enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore), + statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_deciderable_store") + ) + } + + lazy val relaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + ObservedReadableStore( + StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/logFavBasedAPERelaxedFavEngagementThreshold20M145K2020") + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020, + internalId) => + SimClustersEmbeddingId( + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020, + internalId) + } + .mapValues(embedding => SimClustersEmbedding(embedding).truncate(50)) + )(statsReceiver.scope( + "aggregatable_producer_embeddings_by_logfav_score_relaxed_fav_engagement_threshold_2020")) + } + + lazy val relaxedLogFavBasedApe20M145K2020EmbeddingCachedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val truncatedStore = + relaxedLogFavBasedApe20M145K2020EmbeddingStore.mapValues(_.truncate(50).toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = truncatedStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = + statsReceiver.scope("relaxed_log_fav_based_ape_entity_2020_embedding_mem_cache"), + keyToString = { k: SimClustersEmbeddingId => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "relaxed_log_fav_based_ape_entity_2020_embedding_cache", + windowSize = 10000L + )(statsReceiver.scope("relaxed_log_fav_based_ape_entity_2020_embedding_cache_store")) + } + + lazy val favBasedProducer20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = ProducerClusterEmbeddingReadableStores + .getProducerTopKSimClusters2020EmbeddingsStore( + mhMtlsParams + ).composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + FavBasedProducer, + Model20m145k2020, + InternalId.UserId(userId)) => + userId + }.mapValues { topSimClustersWithScore => + ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters.take(10)) + } + + // same memcache config as for favBasedUserInterestedIn20M145K2020Store + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 24.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("fav_based_producer_embedding_20M_145K_2020_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 12.hours, + maxKeys = 16777215, + cacheName = "fav_based_producer_embedding_20M_145K_2020_embedding_cache", + windowSize = 10000L + )(statsReceiver.scope("fav_based_producer_embedding_20M_145K_2020_embedding_store")) + } + + // Production + lazy val interestedIn20M145KUpdatedStore: ReadableStore[UserId, ClustersUserIsInterestedIn] = { + UserInterestedInReadableStore.defaultStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145KUpdated + ) + } + + // Production + lazy val interestedIn20M145K2020Store: ReadableStore[UserId, ClustersUserIsInterestedIn] = { + UserInterestedInReadableStore.defaultStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145K2020 + ) + } + + // Production + lazy val InterestedInFromPE20M145KUpdatedStore: ReadableStore[ + UserId, + ClustersUserIsInterestedIn + ] = { + UserInterestedInReadableStore.defaultIIPEStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145KUpdated) + } + + lazy val simClustersInterestedInStore: ReadableStore[ + (UserId, ModelVersion), + ClustersUserIsInterestedIn + ] = { + new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] { + override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = { + k match { + case (userId, Model20m145kUpdated) => + interestedIn20M145KUpdatedStore.get(userId) + case (userId, Model20m145k2020) => + interestedIn20M145K2020Store.get(userId) + case _ => + Future.None + } + } + } + } + + lazy val simClustersInterestedInFromProducerEmbeddingsStore: ReadableStore[ + (UserId, ModelVersion), + ClustersUserIsInterestedIn + ] = { + new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] { + override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = { + k match { + case (userId, ModelVersion.Model20m145kUpdated) => + InterestedInFromPE20M145KUpdatedStore.get(userId) + case _ => + Future.None + } + } + } + } + + lazy val userInterestedInStore = + new twistly.interestedin.EmbeddingStore( + interestedInStore = simClustersInterestedInStore, + interestedInFromProducerEmbeddingStore = simClustersInterestedInFromProducerEmbeddingsStore, + statsReceiver = statsReceiver + ) + + // Production + lazy val favBasedUserInterestedIn20M145KUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = + UserInterestedInReadableStore + .defaultSimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.FavBasedUserInterestedIn, + ModelVersion.Model20m145kUpdated) + .mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("fav_based_user_interested_in_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "fav_based_user_interested_in_cache", + windowSize = 10000L + )(statsReceiver.scope("fav_based_user_interested_in_store")) + } + + // Production + lazy val LogFavBasedInterestedInFromAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = + UserInterestedInReadableStore + .defaultIIAPESimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.LogFavBasedUserInterestedInFromAPE, + ModelVersion.Model20m145k2020) + .mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("log_fav_based_user_interested_in_from_ape_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "log_fav_based_user_interested_in_from_ape_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_user_interested_in_from_ape_store")) + } + + // Production + lazy val FollowBasedInterestedInFromAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = + UserInterestedInReadableStore + .defaultIIAPESimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.FollowBasedUserInterestedInFromAPE, + ModelVersion.Model20m145k2020) + .mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("follow_based_user_interested_in_from_ape_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "follow_based_user_interested_in_from_ape_cache", + windowSize = 10000L + )(statsReceiver.scope("follow_based_user_interested_in_from_ape_store")) + } + + // production + lazy val favBasedUserInterestedIn20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding] = + UserInterestedInReadableStore + .defaultSimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.FavBasedUserInterestedIn, + ModelVersion.Model20m145k2020).mapValues(_.toThrift) + + ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("fav_based_user_interested_in_2020_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + } + + // Production + lazy val logFavBasedUserInterestedIn20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = + UserInterestedInReadableStore + .defaultSimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.LogFavBasedUserInterestedIn, + ModelVersion.Model20m145k2020) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore.mapValues(_.toThrift), + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("log_fav_based_user_interested_in_2020_store"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "log_fav_based_user_interested_in_2020_cache", + windowSize = 10000L + )(statsReceiver.scope("log_fav_based_user_interested_in_2020_store")) + } + + // Production + lazy val favBasedUserInterestedInFromPE20M145KUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val underlyingStore = + UserInterestedInReadableStore + .defaultIIPESimClustersEmbeddingStoreWithMtls( + mhMtlsParams, + EmbeddingType.FavBasedUserInterestedInFromPE, + ModelVersion.Model20m145kUpdated) + .mapValues(_.toThrift) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = cacheClient, + ttl = 12.hours + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)), + statsReceiver = statsReceiver.scope("fav_based_user_interested_in_from_pe_mem_cache"), + keyToString = { k => embeddingCacheKeyBuilder.apply(k) } + ).mapValues(SimClustersEmbedding(_)) + + ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding]( + memcachedStore, + ttl = 6.hours, + maxKeys = 262143, + cacheName = "fav_based_user_interested_in_from_pe_cache", + windowSize = 10000L + )(statsReceiver.scope("fav_based_user_interested_in_from_pe_cache")) + } + + private val underlyingStores: Map[ + (EmbeddingType, ModelVersion), + ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] + ] = Map( + // Tweet Embeddings + (LogFavBasedTweet, Model20m145kUpdated) -> logFavBased20M145KUpdatedTweetEmbeddingStore, + (LogFavBasedTweet, Model20m145k2020) -> logFavBased20M145K2020TweetEmbeddingStore, + ( + LogFavLongestL2EmbeddingTweet, + Model20m145k2020) -> logFavBasedLongestL2Tweet20M145K2020EmbeddingStore, + // Entity Embeddings + (FavTfgTopic, Model20m145k2020) -> favBasedTfgTopicEmbedding2020Store, + ( + LogFavBasedKgoApeTopic, + Model20m145k2020) -> logFavBasedApeEntity20M145K2020EmbeddingCachedStore, + // KnownFor Embeddings + (FavBasedProducer, Model20m145k2020) -> favBasedProducer20M145K2020EmbeddingStore, + ( + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020) -> relaxedLogFavBasedApe20M145K2020EmbeddingCachedStore, + // InterestedIn Embeddings + ( + LogFavBasedUserInterestedInFromAPE, + Model20m145k2020) -> LogFavBasedInterestedInFromAPE20M145K2020Store, + ( + FollowBasedUserInterestedInFromAPE, + Model20m145k2020) -> FollowBasedInterestedInFromAPE20M145K2020Store, + (FavBasedUserInterestedIn, Model20m145kUpdated) -> favBasedUserInterestedIn20M145KUpdatedStore, + (FavBasedUserInterestedIn, Model20m145k2020) -> favBasedUserInterestedIn20M145K2020Store, + (LogFavBasedUserInterestedIn, Model20m145k2020) -> logFavBasedUserInterestedIn20M145K2020Store, + ( + FavBasedUserInterestedInFromPE, + Model20m145kUpdated) -> favBasedUserInterestedInFromPE20M145KUpdatedStore, + (FilteredUserInterestedIn, Model20m145kUpdated) -> userInterestedInStore, + (FilteredUserInterestedIn, Model20m145k2020) -> userInterestedInStore, + (FilteredUserInterestedInFromPE, Model20m145kUpdated) -> userInterestedInStore, + (UnfilteredUserInterestedIn, Model20m145kUpdated) -> userInterestedInStore, + (UnfilteredUserInterestedIn, Model20m145k2020) -> userInterestedInStore, + ) + + val simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val underlying: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + SimClustersEmbeddingStore.buildWithDecider( + underlyingStores = underlyingStores, + decider = rmsDecider.decider, + statsReceiver = statsReceiver.scope("simClusters_embeddings_store_deciderable") + ) + + val underlyingWithTimeout: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + new ReadableStoreWithTimeout( + rs = underlying, + decider = rmsDecider.decider, + enableTimeoutDeciderKey = DeciderConstants.enableSimClustersEmbeddingStoreTimeouts, + timeoutValueKey = DeciderConstants.simClustersEmbeddingStoreTimeoutValueMillis, + timer = timer, + statsReceiver = statsReceiver.scope("simClusters_embedding_store_timeouts") + ) + + ObservedReadableStore( + store = underlyingWithTimeout + )(statsReceiver.scope("simClusters_embeddings_store")) + } +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/BUILD new file mode 100644 index 000000000..ab19a1dd7 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/BUILD @@ -0,0 +1,18 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication", + "finagle/finagle-stats", + "finatra/inject/inject-core/src/main/scala", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util", + "interests-service/thrift/src/main/thrift:thrift-scala", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/common", + "servo/util", + "src/scala/com/twitter/storehaus_internal/manhattan", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/util", + "strato/src/main/scala/com/twitter/strato/client", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/CacheModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/CacheModule.scala new file mode 100644 index 000000000..a042225fa --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/CacheModule.scala @@ -0,0 +1,34 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import com.twitter.finagle.memcached.Client +import javax.inject.Singleton +import com.twitter.conversions.DurationOps._ +import com.twitter.inject.TwitterModule +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.storehaus_internal.memcache.MemcacheStore +import com.twitter.storehaus_internal.util.ClientName +import com.twitter.storehaus_internal.util.ZkEndPoint + +object CacheModule extends TwitterModule { + + private val cacheDest = flag[String]("cache_module.dest", "Path to memcache service") + private val timeout = flag[Int]("memcache.timeout", "Memcache client timeout") + private val retries = flag[Int]("memcache.retries", "Memcache timeout retries") + + @Singleton + @Provides + def providesCache( + serviceIdentifier: ServiceIdentifier, + stats: StatsReceiver + ): Client = + MemcacheStore.memcachedClient( + name = ClientName("memcache_representation_manager"), + dest = ZkEndPoint(cacheDest()), + timeout = timeout().milliseconds, + retries = retries(), + statsReceiver = stats.scope("cache_client"), + serviceIdentifier = serviceIdentifier + ) +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/InterestsThriftClientModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/InterestsThriftClientModule.scala new file mode 100644 index 000000000..82a5a5004 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/InterestsThriftClientModule.scala @@ -0,0 +1,40 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.ThriftMux +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.service.ReqRep +import com.twitter.finagle.service.ResponseClass +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.thrift.ClientId +import com.twitter.inject.TwitterModule +import com.twitter.interests.thriftscala.InterestsThriftService +import com.twitter.util.Throw +import javax.inject.Singleton + +object InterestsThriftClientModule extends TwitterModule { + + @Singleton + @Provides + def providesInterestsThriftClient( + clientId: ClientId, + serviceIdentifier: ServiceIdentifier, + statsReceiver: StatsReceiver + ): InterestsThriftService.MethodPerEndpoint = { + ThriftMux.client + .withClientId(clientId) + .withMutualTls(serviceIdentifier) + .withRequestTimeout(450.milliseconds) + .withStatsReceiver(statsReceiver.scope("InterestsThriftClient")) + .withResponseClassifier { + case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable + } + .build[InterestsThriftService.MethodPerEndpoint]( + dest = "/s/interests-thrift-service/interests-thrift-service", + label = "interests_thrift_service" + ) + } +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/LegacyRMSConfigModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/LegacyRMSConfigModule.scala new file mode 100644 index 000000000..0a06dffe6 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/LegacyRMSConfigModule.scala @@ -0,0 +1,18 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import com.twitter.inject.TwitterModule +import javax.inject.Named +import javax.inject.Singleton + +object LegacyRMSConfigModule extends TwitterModule { + @Singleton + @Provides + @Named("cacheHashKeyPrefix") + def providesCacheHashKeyPrefix: String = "RMS" + + @Singleton + @Provides + @Named("useContentRecommenderConfiguration") + def providesUseContentRecommenderConfiguration: Boolean = false +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/StoreModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/StoreModule.scala new file mode 100644 index 000000000..a2efe5925 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/StoreModule.scala @@ -0,0 +1,24 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import javax.inject.Singleton +import com.twitter.inject.TwitterModule +import com.twitter.decider.Decider +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.representation_manager.common.RepresentationManagerDecider +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams + +object StoreModule extends TwitterModule { + @Singleton + @Provides + def providesMhMtlsParams( + serviceIdentifier: ServiceIdentifier + ): ManhattanKVClientMtlsParams = ManhattanKVClientMtlsParams(serviceIdentifier) + + @Singleton + @Provides + def providesRmsDecider( + decider: Decider + ): RepresentationManagerDecider = RepresentationManagerDecider(decider) + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/TimerModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/TimerModule.scala new file mode 100644 index 000000000..fe7fddb45 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/TimerModule.scala @@ -0,0 +1,13 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import com.twitter.finagle.util.DefaultTimer +import com.twitter.inject.TwitterModule +import com.twitter.util.Timer +import javax.inject.Singleton + +object TimerModule extends TwitterModule { + @Singleton + @Provides + def providesTimer: Timer = DefaultTimer +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/UttClientModule.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/UttClientModule.scala new file mode 100644 index 000000000..cc2100c1c --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/modules/UttClientModule.scala @@ -0,0 +1,39 @@ +package com.twitter.representation_manager.modules + +import com.google.inject.Provides +import com.twitter.escherbird.util.uttclient.CacheConfigV2 +import com.twitter.escherbird.util.uttclient.CachedUttClientV2 +import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2 +import com.twitter.escherbird.utt.strato.thriftscala.Environment +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.strato.client.{Client => StratoClient} +import javax.inject.Singleton + +object UttClientModule extends TwitterModule { + + @Singleton + @Provides + def providesUttClient( + stratoClient: StratoClient, + statsReceiver: StatsReceiver + ): CachedUttClientV2 = { + // Save 2 ^ 18 UTTs. Promising 100% cache rate + val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143) + + val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2( + getTaxonomyConfig = defaultCacheConfigV2, + getUttTaxonomyConfig = defaultCacheConfigV2, + getLeafIds = defaultCacheConfigV2, + getLeafUttEntities = defaultCacheConfigV2 + ) + + // CachedUttClient to use StratoClient + new CachedUttClientV2( + stratoClient = stratoClient, + env = Environment.Prod, + cacheConfigs = uttClientCacheConfigsV2, + statsReceiver = statsReceiver.scope("cached_utt_client") + ) + } +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/BUILD b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/BUILD new file mode 100644 index 000000000..1731a2649 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/BUILD @@ -0,0 +1,16 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "content-recommender/server/src/main/scala/com/twitter/contentrecommender:representation-manager-deps", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/common", + "src/scala/com/twitter/simclusters_v2/stores", + "src/scala/com/twitter/simclusters_v2/summingbird/stores", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "storage/clients/manhattan/client/src/main/scala", + "tweetypie/src/scala/com/twitter/tweetypie/util", + ], +) diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/DeciderConstants.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/DeciderConstants.scala new file mode 100644 index 000000000..dd00ea126 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/DeciderConstants.scala @@ -0,0 +1,39 @@ +package com.twitter.representation_manager.store + +import com.twitter.servo.decider.DeciderKeyEnum + +object DeciderConstants { + // Deciders inherited from CR and RSX and only used in LegacyRMS + // Their value are manipulated by CR and RSX's yml file and their decider dashboard + // We will remove them after migration completed + val enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore = + "enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore" + + val enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore = + "enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore" + + val enablelogFavBased20M145K2020TweetEmbeddingStoreTimeouts = + "enable_log_fav_based_tweet_embedding_20m145k2020_timeouts" + val logFavBased20M145K2020TweetEmbeddingStoreTimeoutValueMillis = + "log_fav_based_tweet_embedding_20m145k2020_timeout_value_millis" + + val enablelogFavBased20M145KUpdatedTweetEmbeddingStoreTimeouts = + "enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts" + val logFavBased20M145KUpdatedTweetEmbeddingStoreTimeoutValueMillis = + "log_fav_based_tweet_embedding_20m145kUpdated_timeout_value_millis" + + val enableSimClustersEmbeddingStoreTimeouts = "enable_sim_clusters_embedding_store_timeouts" + val simClustersEmbeddingStoreTimeoutValueMillis = + "sim_clusters_embedding_store_timeout_value_millis" +} + +// Necessary for using servo Gates +object DeciderKey extends DeciderKeyEnum { + val enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore: Value = Value( + DeciderConstants.enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore + ) + + val enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore: Value = Value( + DeciderConstants.enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore + ) +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TopicSimClustersEmbeddingStore.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TopicSimClustersEmbeddingStore.scala new file mode 100644 index 000000000..cc6485b79 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TopicSimClustersEmbeddingStore.scala @@ -0,0 +1,198 @@ +package com.twitter.representation_manager.store + +import com.twitter.contentrecommender.store.ApeEntityEmbeddingStore +import com.twitter.contentrecommender.store.InterestsOptOutStore +import com.twitter.contentrecommender.store.SemanticCoreTopicSeedStore +import com.twitter.conversions.DurationOps._ +import com.twitter.escherbird.util.uttclient.CachedUttClientV2 +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.frigate.common.util.SeqLongInjection +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.interests.thriftscala.InterestsThriftService +import com.twitter.representation_manager.common.MemCacheConfig +import com.twitter.representation_manager.common.RepresentationManagerDecider +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.simclusters_v2.thriftscala.LocaleEntityId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.tweetypie.util.UserId +import javax.inject.Inject + +class TopicSimClustersEmbeddingStore @Inject() ( + stratoClient: StratoClient, + cacheClient: Client, + globalStats: StatsReceiver, + mhMtlsParams: ManhattanKVClientMtlsParams, + rmsDecider: RepresentationManagerDecider, + interestService: InterestsThriftService.MethodPerEndpoint, + uttClient: CachedUttClientV2) { + + private val stats = globalStats.scope(this.getClass.getSimpleName) + private val interestsOptOutStore = InterestsOptOutStore(interestService) + + /** + * Note this is NOT an embedding store. It is a list of author account ids we use to represent + * topics + */ + private val semanticCoreTopicSeedStore: ReadableStore[ + SemanticCoreTopicSeedStore.Key, + Seq[UserId] + ] = { + /* + Up to 1000 Long seeds per topic/language = 62.5kb per topic/language (worst case) + Assume ~10k active topic/languages ~= 650MB (worst case) + */ + val underlying = new SemanticCoreTopicSeedStore(uttClient, interestsOptOutStore)( + stats.scope("semantic_core_topic_seed_store")) + + val memcacheStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlying, + cacheClient = cacheClient, + ttl = 12.hours)( + valueInjection = SeqLongInjection, + statsReceiver = stats.scope("topic_producer_seed_store_mem_cache"), + keyToString = { k => s"tpss:${k.entityId}_${k.languageCode}" } + ) + + ObservedCachedReadableStore.from[SemanticCoreTopicSeedStore.Key, Seq[UserId]]( + store = memcacheStore, + ttl = 6.hours, + maxKeys = 20e3.toInt, + cacheName = "topic_producer_seed_store_cache", + windowSize = 5000 + )(stats.scope("topic_producer_seed_store_cache")) + } + + private val favBasedTfgTopicEmbedding20m145k2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/favBasedTFGTopic20M145K2020").mapValues( + embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift) + .composeKeyMapping[LocaleEntityId] { localeEntityId => + SimClustersEmbeddingId( + FavTfgTopic, + Model20m145k2020, + InternalId.LocaleEntityId(localeEntityId)) + } + + buildLocaleEntityIdMemCacheStore(rawStore, FavTfgTopic, Model20m145k2020) + } + + private val logFavBasedApeEntity20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val apeStore = StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020") + .mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50)) + .composeKeyMapping[UserId]({ id => + SimClustersEmbeddingId( + AggregatableLogFavBasedProducer, + Model20m145k2020, + InternalId.UserId(id)) + }) + val rawStore = new ApeEntityEmbeddingStore( + semanticCoreSeedStore = semanticCoreTopicSeedStore, + aggregatableProducerEmbeddingStore = apeStore, + statsReceiver = stats.scope("log_fav_based_ape_entity_2020_embedding_store")) + .mapValues(embedding => SimClustersEmbedding(embedding.toThrift, truncate = 50).toThrift) + .composeKeyMapping[TopicId] { topicId => + SimClustersEmbeddingId( + LogFavBasedKgoApeTopic, + Model20m145k2020, + InternalId.TopicId(topicId)) + } + + buildTopicIdMemCacheStore(rawStore, LogFavBasedKgoApeTopic, Model20m145k2020) + } + + private def buildTopicIdMemCacheStore( + rawStore: ReadableStore[TopicId, ThriftSimClustersEmbedding], + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val observedStore: ObservedReadableStore[TopicId, ThriftSimClustersEmbedding] = + ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) => + topicId + } + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + storeWithKeyMapping, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private def buildLocaleEntityIdMemCacheStore( + rawStore: ReadableStore[LocaleEntityId, ThriftSimClustersEmbedding], + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val observedStore: ObservedReadableStore[LocaleEntityId, ThriftSimClustersEmbedding] = + ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.LocaleEntityId(localeEntityId)) => + localeEntityId + } + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + storeWithKeyMapping, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private val underlyingStores: Map[ + (EmbeddingType, ModelVersion), + ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] + ] = Map( + // Topic Embeddings + (FavTfgTopic, Model20m145k2020) -> favBasedTfgTopicEmbedding20m145k2020Store, + (LogFavBasedKgoApeTopic, Model20m145k2020) -> logFavBasedApeEntity20M145K2020EmbeddingStore, + ) + + val topicSimClustersEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + SimClustersEmbeddingStore.buildWithDecider( + underlyingStores = underlyingStores, + decider = rmsDecider.decider, + statsReceiver = stats + ) + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TweetSimClustersEmbeddingStore.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TweetSimClustersEmbeddingStore.scala new file mode 100644 index 000000000..857e38649 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/TweetSimClustersEmbeddingStore.scala @@ -0,0 +1,141 @@ +package com.twitter.representation_manager.store + +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.representation_manager.common.MemCacheConfig +import com.twitter.representation_manager.common.RepresentationManagerDecider +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore +import com.twitter.simclusters_v2.summingbird.stores.PersistentTweetEmbeddingStore +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus.ReadableStore +import javax.inject.Inject + +class TweetSimClustersEmbeddingStore @Inject() ( + cacheClient: Client, + globalStats: StatsReceiver, + mhMtlsParams: ManhattanKVClientMtlsParams, + rmsDecider: RepresentationManagerDecider) { + + private val stats = globalStats.scope(this.getClass.getSimpleName) + + val logFavBasedLongestL2Tweet20M145KUpdatedEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .longestL2NormTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset, + stats + ).mapValues(_.toThrift) + + buildMemCacheStore(rawStore, LogFavLongestL2EmbeddingTweet, Model20m145kUpdated) + } + + val logFavBasedLongestL2Tweet20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .longestL2NormTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset, + stats + ).mapValues(_.toThrift) + + buildMemCacheStore(rawStore, LogFavLongestL2EmbeddingTweet, Model20m145k2020) + } + + val logFavBased20M145KUpdatedTweetEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .mostRecentTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset, + stats + ).mapValues(_.toThrift) + + buildMemCacheStore(rawStore, LogFavBasedTweet, Model20m145kUpdated) + } + + val logFavBased20M145K2020TweetEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + PersistentTweetEmbeddingStore + .mostRecentTweetEmbeddingStoreManhattan( + mhMtlsParams, + PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset, + stats + ).mapValues(_.toThrift) + + buildMemCacheStore(rawStore, LogFavBasedTweet, Model20m145k2020) + } + + private def buildMemCacheStore( + rawStore: ReadableStore[TweetId, ThriftSimClustersEmbedding], + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val observedStore: ObservedReadableStore[TweetId, ThriftSimClustersEmbedding] = + ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.TweetId(tweetId)) => + tweetId + } + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + storeWithKeyMapping, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private val underlyingStores: Map[ + (EmbeddingType, ModelVersion), + ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] + ] = Map( + // Tweet Embeddings + (LogFavBasedTweet, Model20m145kUpdated) -> logFavBased20M145KUpdatedTweetEmbeddingStore, + (LogFavBasedTweet, Model20m145k2020) -> logFavBased20M145K2020TweetEmbeddingStore, + ( + LogFavLongestL2EmbeddingTweet, + Model20m145kUpdated) -> logFavBasedLongestL2Tweet20M145KUpdatedEmbeddingStore, + ( + LogFavLongestL2EmbeddingTweet, + Model20m145k2020) -> logFavBasedLongestL2Tweet20M145K2020EmbeddingStore, + ) + + val tweetSimClustersEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + SimClustersEmbeddingStore.buildWithDecider( + underlyingStores = underlyingStores, + decider = rmsDecider.decider, + statsReceiver = stats + ) + } + +} diff --git a/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/UserSimClustersEmbeddingStore.scala b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/UserSimClustersEmbeddingStore.scala new file mode 100644 index 000000000..b416d9b17 --- /dev/null +++ b/representation-manager/server/src/main/scala/com/twitter/representation_manager/store/UserSimClustersEmbeddingStore.scala @@ -0,0 +1,602 @@ +package com.twitter.representation_manager.store + +import com.twitter.contentrecommender.twistly +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.representation_manager.common.MemCacheConfig +import com.twitter.representation_manager.common.RepresentationManagerDecider +import com.twitter.simclusters_v2.common.ModelVersions +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore +import com.twitter.simclusters_v2.summingbird.stores.ProducerClusterEmbeddingReadableStores +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.getStore +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.modelVersionToDatasetMap +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.knownModelVersions +import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.toSimClustersEmbedding +import com.twitter.simclusters_v2.thriftscala.ClustersUserIsInterestedIn +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding} +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus.ReadableStore +import com.twitter.storehaus_internal.manhattan.Apollo +import com.twitter.storehaus_internal.manhattan.ManhattanCluster +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.strato.thrift.ScroogeConvImplicits._ +import com.twitter.tweetypie.util.UserId +import com.twitter.util.Future +import javax.inject.Inject + +class UserSimClustersEmbeddingStore @Inject() ( + stratoClient: StratoClient, + cacheClient: Client, + globalStats: StatsReceiver, + mhMtlsParams: ManhattanKVClientMtlsParams, + rmsDecider: RepresentationManagerDecider) { + + private val stats = globalStats.scope(this.getClass.getSimpleName) + + private val favBasedProducer20M145KUpdatedEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = ProducerClusterEmbeddingReadableStores + .getProducerTopKSimClustersEmbeddingsStore( + mhMtlsParams + ).mapValues { topSimClustersWithScore => + ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters) + }.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) => + userId + } + + buildMemCacheStore(rawStore, FavBasedProducer, Model20m145kUpdated) + } + + private val favBasedProducer20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = ProducerClusterEmbeddingReadableStores + .getProducerTopKSimClusters2020EmbeddingsStore( + mhMtlsParams + ).mapValues { topSimClustersWithScore => + ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters) + }.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) => + userId + } + + buildMemCacheStore(rawStore, FavBasedProducer, Model20m145k2020) + } + + private val followBasedProducer20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = ProducerClusterEmbeddingReadableStores + .getProducerTopKSimClustersEmbeddingsByFollowStore( + mhMtlsParams + ).mapValues { topSimClustersWithScore => + ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters) + }.composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) => + userId + } + + buildMemCacheStore(rawStore, FollowBasedProducer, Model20m145k2020) + } + + private val logFavBasedApe20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020") + .mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift) + + buildMemCacheStore(rawStore, AggregatableLogFavBasedProducer, Model20m145k2020) + } + + private val rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + ThriftSimClustersEmbedding + ] = { + StratoFetchableStore + .withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding]( + stratoClient, + "recommendations/simclusters_v2/embeddings/logFavBasedAPERelaxedFavEngagementThreshold20M145K2020") + .mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift) + } + + private val relaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore( + rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore, + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020) + } + + private val relaxedLogFavBasedApe20m145kUpdatedEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId( + RelaxedAggregatableLogFavBasedProducer, + Model20m145kUpdated, + internalId) => + SimClustersEmbeddingId( + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020, + internalId) + } + + buildMemCacheStore(rawStore, RelaxedAggregatableLogFavBasedProducer, Model20m145kUpdated) + } + + private val logFavBasedInterestedInFromAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultIIAPESimClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedInFromAPE, + Model20m145k2020) + } + + private val followBasedInterestedInFromAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultIIAPESimClustersEmbeddingStoreWithMtls, + FollowBasedUserInterestedInFromAPE, + Model20m145k2020) + } + + private val favBasedUserInterestedIn20M145KUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls, + FavBasedUserInterestedIn, + Model20m145kUpdated) + } + + private val favBasedUserInterestedIn20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls, + FavBasedUserInterestedIn, + Model20m145k2020) + } + + private val followBasedUserInterestedIn20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls, + FollowBasedUserInterestedIn, + Model20m145k2020) + } + + private val logFavBasedUserInterestedIn20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedIn, + Model20m145k2020) + } + + private val favBasedUserInterestedInFromPE20M145KUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultIIPESimClustersEmbeddingStoreWithMtls, + FavBasedUserInterestedInFromPE, + Model20m145kUpdated) + } + + private val twistlyUserInterestedInStore: ReadableStore[ + SimClustersEmbeddingId, + ThriftSimClustersEmbedding + ] = { + val interestedIn20M145KUpdatedStore = { + UserInterestedInReadableStore.defaultStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145KUpdated + ) + } + val interestedIn20M145K2020Store = { + UserInterestedInReadableStore.defaultStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145K2020 + ) + } + val interestedInFromPE20M145KUpdatedStore = { + UserInterestedInReadableStore.defaultIIPEStoreWithMtls( + mhMtlsParams, + modelVersion = ModelVersions.Model20M145KUpdated) + } + val simClustersInterestedInStore: ReadableStore[ + (UserId, ModelVersion), + ClustersUserIsInterestedIn + ] = { + new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] { + override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = { + k match { + case (userId, Model20m145kUpdated) => + interestedIn20M145KUpdatedStore.get(userId) + case (userId, Model20m145k2020) => + interestedIn20M145K2020Store.get(userId) + case _ => + Future.None + } + } + } + } + val simClustersInterestedInFromProducerEmbeddingsStore: ReadableStore[ + (UserId, ModelVersion), + ClustersUserIsInterestedIn + ] = { + new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] { + override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = { + k match { + case (userId, ModelVersion.Model20m145kUpdated) => + interestedInFromPE20M145KUpdatedStore.get(userId) + case _ => + Future.None + } + } + } + } + new twistly.interestedin.EmbeddingStore( + interestedInStore = simClustersInterestedInStore, + interestedInFromProducerEmbeddingStore = simClustersInterestedInFromProducerEmbeddingsStore, + statsReceiver = stats + ).mapValues(_.toThrift) + } + + private val userNextInterestedIn20m145k2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildUserInterestedInStore( + UserInterestedInReadableStore.defaultNextInterestedInStoreWithMtls, + UserNextInterestedIn, + Model20m145k2020) + } + + private val filteredUserInterestedIn20m145kUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore(twistlyUserInterestedInStore, FilteredUserInterestedIn, Model20m145kUpdated) + } + + private val filteredUserInterestedIn20m145k2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore(twistlyUserInterestedInStore, FilteredUserInterestedIn, Model20m145k2020) + } + + private val filteredUserInterestedInFromPE20m145kUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore( + twistlyUserInterestedInStore, + FilteredUserInterestedInFromPE, + Model20m145kUpdated) + } + + private val unfilteredUserInterestedIn20m145kUpdatedStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore( + twistlyUserInterestedInStore, + UnfilteredUserInterestedIn, + Model20m145kUpdated) + } + + private val unfilteredUserInterestedIn20m145k2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + buildMemCacheStore(twistlyUserInterestedInStore, UnfilteredUserInterestedIn, Model20m145k2020) + } + + // [Experimental] User InterestedIn, generated by aggregating IIAPE embedding from AddressBook + + private val logFavBasedInterestedMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_maxpooling" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + private val logFavBasedInterestedAverageAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_average" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedAverageAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + private val logFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_booktype_maxpooling" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + private val logFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_largestdim_maxpooling" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + private val logFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_louvain_maxpooling" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + private val logFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val datasetName = "addressbook_sims_embedding_iiape_connected_maxpooling" + val appId = "wtf_embedding_apollo" + buildUserInterestedInStoreGeneric( + simClustersEmbeddingStoreWithMtls, + LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020, + datasetName = datasetName, + appId = appId, + manhattanCluster = Apollo + ) + } + + /** + * Helper func to build a readable store for some UserInterestedIn embeddings with + * 1. A storeFunc from UserInterestedInReadableStore + * 2. EmbeddingType + * 3. ModelVersion + * 4. MemCacheConfig + * */ + private def buildUserInterestedInStore( + storeFunc: (ManhattanKVClientMtlsParams, EmbeddingType, ModelVersion) => ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ], + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = storeFunc(mhMtlsParams, embeddingType, modelVersion) + .mapValues(_.toThrift) + val observedStore = ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + observedStore, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private def buildUserInterestedInStoreGeneric( + storeFunc: (ManhattanKVClientMtlsParams, EmbeddingType, ModelVersion, String, String, + ManhattanCluster) => ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ], + embeddingType: EmbeddingType, + modelVersion: ModelVersion, + datasetName: String, + appId: String, + manhattanCluster: ManhattanCluster + ): ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + val rawStore = + storeFunc(mhMtlsParams, embeddingType, modelVersion, datasetName, appId, manhattanCluster) + .mapValues(_.toThrift) + val observedStore = ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + observedStore, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private def simClustersEmbeddingStoreWithMtls( + mhMtlsParams: ManhattanKVClientMtlsParams, + embeddingType: EmbeddingType, + modelVersion: ModelVersion, + datasetName: String, + appId: String, + manhattanCluster: ManhattanCluster + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + + if (!modelVersionToDatasetMap.contains(ModelVersions.toKnownForModelVersion(modelVersion))) { + throw new IllegalArgumentException( + "Unknown model version: " + modelVersion + ". Known model versions: " + knownModelVersions) + } + getStore(appId, mhMtlsParams, datasetName, manhattanCluster) + .composeKeyMapping[SimClustersEmbeddingId] { + case SimClustersEmbeddingId(theEmbeddingType, theModelVersion, InternalId.UserId(userId)) + if theEmbeddingType == embeddingType && theModelVersion == modelVersion => + userId + }.mapValues(toSimClustersEmbedding(_, embeddingType)) + } + + private def buildMemCacheStore( + rawStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding], + embeddingType: EmbeddingType, + modelVersion: ModelVersion + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val observedStore = ObservedReadableStore( + store = rawStore + )(stats.scope(embeddingType.name).scope(modelVersion.name)) + + MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding( + observedStore, + cacheClient, + embeddingType, + modelVersion, + stats + ) + } + + private val underlyingStores: Map[ + (EmbeddingType, ModelVersion), + ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] + ] = Map( + // KnownFor Embeddings + (FavBasedProducer, Model20m145kUpdated) -> favBasedProducer20M145KUpdatedEmbeddingStore, + (FavBasedProducer, Model20m145k2020) -> favBasedProducer20M145K2020EmbeddingStore, + (FollowBasedProducer, Model20m145k2020) -> followBasedProducer20M145K2020EmbeddingStore, + (AggregatableLogFavBasedProducer, Model20m145k2020) -> logFavBasedApe20M145K2020EmbeddingStore, + ( + RelaxedAggregatableLogFavBasedProducer, + Model20m145kUpdated) -> relaxedLogFavBasedApe20m145kUpdatedEmbeddingStore, + ( + RelaxedAggregatableLogFavBasedProducer, + Model20m145k2020) -> relaxedLogFavBasedApe20M145K2020EmbeddingStore, + // InterestedIn Embeddings + ( + LogFavBasedUserInterestedInFromAPE, + Model20m145k2020) -> logFavBasedInterestedInFromAPE20M145K2020Store, + ( + FollowBasedUserInterestedInFromAPE, + Model20m145k2020) -> followBasedInterestedInFromAPE20M145K2020Store, + (FavBasedUserInterestedIn, Model20m145kUpdated) -> favBasedUserInterestedIn20M145KUpdatedStore, + (FavBasedUserInterestedIn, Model20m145k2020) -> favBasedUserInterestedIn20M145K2020Store, + (FollowBasedUserInterestedIn, Model20m145k2020) -> followBasedUserInterestedIn20M145K2020Store, + (LogFavBasedUserInterestedIn, Model20m145k2020) -> logFavBasedUserInterestedIn20M145K2020Store, + ( + FavBasedUserInterestedInFromPE, + Model20m145kUpdated) -> favBasedUserInterestedInFromPE20M145KUpdatedStore, + (FilteredUserInterestedIn, Model20m145kUpdated) -> filteredUserInterestedIn20m145kUpdatedStore, + (FilteredUserInterestedIn, Model20m145k2020) -> filteredUserInterestedIn20m145k2020Store, + ( + FilteredUserInterestedInFromPE, + Model20m145kUpdated) -> filteredUserInterestedInFromPE20m145kUpdatedStore, + ( + UnfilteredUserInterestedIn, + Model20m145kUpdated) -> unfilteredUserInterestedIn20m145kUpdatedStore, + (UnfilteredUserInterestedIn, Model20m145k2020) -> unfilteredUserInterestedIn20m145k2020Store, + (UserNextInterestedIn, Model20m145k2020) -> userNextInterestedIn20m145k2020Store, + ( + LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedInterestedMaxpoolingAddressBookFromIIAPE20M145K2020Store, + ( + LogFavBasedUserInterestedAverageAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedInterestedAverageAddressBookFromIIAPE20M145K2020Store, + ( + LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE20M145K2020Store, + ( + LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE20M145K2020Store, + ( + LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE20M145K2020Store, + ( + LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE, + Model20m145k2020) -> logFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE20M145K2020Store, + ) + + val userSimClustersEmbeddingStore: ReadableStore[ + SimClustersEmbeddingId, + SimClustersEmbedding + ] = { + SimClustersEmbeddingStore.buildWithDecider( + underlyingStores = underlyingStores, + decider = rmsDecider.decider, + statsReceiver = stats + ) + } + +} diff --git a/representation-manager/server/src/main/thrift/BUILD b/representation-manager/server/src/main/thrift/BUILD new file mode 100644 index 000000000..f4edb5dcb --- /dev/null +++ b/representation-manager/server/src/main/thrift/BUILD @@ -0,0 +1,18 @@ +create_thrift_libraries( + base_name = "thrift", + sources = [ + "com/twitter/representation_manager/service.thrift", + ], + platform = "java8", + tags = [ + "bazel-compatible", + ], + dependency_roots = [ + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift", + ], + generate_languages = [ + "java", + "scala", + "strato", + ], +) diff --git a/representation-manager/server/src/main/thrift/com/twitter/representation_manager/service.thrift b/representation-manager/server/src/main/thrift/com/twitter/representation_manager/service.thrift new file mode 100644 index 000000000..4eb36e999 --- /dev/null +++ b/representation-manager/server/src/main/thrift/com/twitter/representation_manager/service.thrift @@ -0,0 +1,14 @@ +namespace java com.twitter.representation_manager.thriftjava +#@namespace scala com.twitter.representation_manager.thriftscala +#@namespace strato com.twitter.representation_manager + +include "com/twitter/simclusters_v2/online_store.thrift" +include "com/twitter/simclusters_v2/identifier.thrift" + +/** + * A uniform column view for all kinds of SimClusters based embeddings. + **/ +struct SimClustersEmbeddingView { + 1: required identifier.EmbeddingType embeddingType + 2: required online_store.ModelVersion modelVersion +}(persisted = 'false', hasPersonalData = 'false') diff --git a/representation-scorer/BUILD.bazel b/representation-scorer/BUILD.bazel new file mode 100644 index 000000000..1624a57d4 --- /dev/null +++ b/representation-scorer/BUILD.bazel @@ -0,0 +1 @@ +# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD diff --git a/representation-scorer/README.md b/representation-scorer/README.md new file mode 100644 index 000000000..b74e3472f --- /dev/null +++ b/representation-scorer/README.md @@ -0,0 +1,5 @@ +# Representation Scorer # + +**Representation Scorer** (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features. + +The Representation Scorer acquires user behavior data from the User Signal Service (USS) and extracts embeddings from the Representation Manager (RMS). It then calculates both pairwise and listwise features. These features are used at various stages, including candidate retrieval and ranking. \ No newline at end of file diff --git a/representation-scorer/bin/canary-check.sh b/representation-scorer/bin/canary-check.sh new file mode 100755 index 000000000..cbb31f9ad --- /dev/null +++ b/representation-scorer/bin/canary-check.sh @@ -0,0 +1,8 @@ +#!/bin/bash + +export CANARY_CHECK_ROLE="representation-scorer" +export CANARY_CHECK_NAME="representation-scorer" +export CANARY_CHECK_INSTANCES="0-19" + +python3 relevance-platform/tools/canary_check.py "$@" + diff --git a/representation-scorer/bin/deploy.sh b/representation-scorer/bin/deploy.sh new file mode 100755 index 000000000..2f1ab8a69 --- /dev/null +++ b/representation-scorer/bin/deploy.sh @@ -0,0 +1,4 @@ +#!/usr/bin/env bash + +JOB=representation-scorer bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress \ + //relevance-platform/src/main/python/deploy -- "$@" diff --git a/representation-scorer/bin/remote-debug-tunnel.sh b/representation-scorer/bin/remote-debug-tunnel.sh new file mode 100755 index 000000000..2a6e71511 --- /dev/null +++ b/representation-scorer/bin/remote-debug-tunnel.sh @@ -0,0 +1,66 @@ +#!/bin/bash + +set -o nounset +set -eu + +DC="atla" +ROLE="$USER" +SERVICE="representation-scorer" +INSTANCE="0" +KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE" + +while test $# -gt 0; do + case "$1" in + -h|--help) + echo "$0 Set up an ssh tunnel for $SERVICE remote debugging and disable aurora health checks" + echo " " + echo "See representation-scorer/README.md for details of how to use this script, and go/remote-debug for" + echo "general information about remote debugging in Aurora" + echo " " + echo "Default instance if called with no args:" + echo " $KEY" + echo " " + echo "Positional args:" + echo " $0 [datacentre] [role] [service_name] [instance]" + echo " " + echo "Options:" + echo " -h, --help show brief help" + exit 0 + ;; + *) + break + ;; + esac +done + +if [ -n "${1-}" ]; then + DC="$1" +fi + +if [ -n "${2-}" ]; then + ROLE="$2" +fi + +if [ -n "${3-}" ]; then + SERVICE="$3" +fi + +if [ -n "${4-}" ]; then + INSTANCE="$4" +fi + +KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE" +read -p "Set up remote debugger tunnel for $KEY? (y/n) " -r CONFIRM +if [[ ! $CONFIRM =~ ^[Yy]$ ]]; then + echo "Exiting, tunnel not created" + exit 1 +fi + +echo "Disabling health check and opening tunnel. Exit with control-c when you're finished" +CMD="aurora task ssh $KEY -c 'touch .healthchecksnooze' && aurora task ssh $KEY -L '5005:debug' --ssh-options '-N -S none -v '" + +echo "Running $CMD" +eval "$CMD" + + + diff --git a/representation-scorer/docs/index.rst b/representation-scorer/docs/index.rst new file mode 100644 index 000000000..c4fd8966d --- /dev/null +++ b/representation-scorer/docs/index.rst @@ -0,0 +1,39 @@ +Representation Scorer (RSX) +########################### + +Overview +======== + +Representation Scorer (RSX) is a StratoFed service which serves scores for pairs of entities (User, Tweet, Topic...) based on some representation of those entities. For example, it serves User-Tweet scores based on the cosine similarity of SimClusters embeddings for each of these. It aims to provide these with low latency and at high scale, to support applications such as scoring for ANN candidate generation and feature hydration via feature store. + + +Current use cases +----------------- + +RSX currently serves traffic for the following use cases: + +- User-Tweet similarity scores for Home ranking, using SimClusters embedding dot product +- Topic-Tweet similarity scores for topical tweet candidate generation and topic social proof, using SimClusters embedding cosine similarity and CERTO scores +- Tweet-Tweet and User-Tweet similarity scores for ANN candidate generation, using SimClusters embedding cosine similarity +- (in development) User-Tweet similarity scores for Home ranking, based on various aggregations of similarities with recent faves, retweets and follows performed by the user + +Getting Started +=============== + +Fetching scores +--------------- + +Scores are served from the recommendations/representation_scorer/score column. + +Using RSX for your application +------------------------------ + +RSX may be a good fit for your application if you need scores based on combinations of SimCluster embeddings for core nouns. We also plan to support other embeddings and scoring approaches in the future. + +.. toctree:: + :maxdepth: 2 + :hidden: + + index + + diff --git a/representation-scorer/server/BUILD b/representation-scorer/server/BUILD new file mode 100644 index 000000000..cc7325192 --- /dev/null +++ b/representation-scorer/server/BUILD @@ -0,0 +1,22 @@ +jvm_binary( + name = "bin", + basename = "representation-scorer", + main = "com.twitter.representationscorer.RepresentationScorerFedServerMain", + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finatra/inject/inject-logback/src/main/scala", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "representation-scorer/server/src/main/resources", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer", + "twitter-server/logback-classic/src/main/scala", + ], +) + +# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app +jvm_app( + name = "representation-scorer-app", + archive = "zip", + binary = ":bin", + tags = ["bazel-compatible"], +) diff --git a/representation-scorer/server/src/main/resources/BUILD b/representation-scorer/server/src/main/resources/BUILD new file mode 100644 index 000000000..150a224ff --- /dev/null +++ b/representation-scorer/server/src/main/resources/BUILD @@ -0,0 +1,9 @@ +resources( + sources = [ + "*.xml", + "*.yml", + "com/twitter/slo/slo.json", + "config/*.yml", + ], + tags = ["bazel-compatible"], +) diff --git a/representation-scorer/server/src/main/resources/com/twitter/slo/slo.json b/representation-scorer/server/src/main/resources/com/twitter/slo/slo.json new file mode 100644 index 000000000..836b44058 --- /dev/null +++ b/representation-scorer/server/src/main/resources/com/twitter/slo/slo.json @@ -0,0 +1,55 @@ +{ + "servers": [ + { + "name": "strato", + "indicators": [ + { + "id": "success_rate_3m", + "indicator_type": "SuccessRateIndicator", + "duration": 3, + "duration_unit": "MINUTES" + }, { + "id": "latency_3m_p99", + "indicator_type": "LatencyIndicator", + "duration": 3, + "duration_unit": "MINUTES", + "percentile": 0.99 + } + ], + "objectives": [ + { + "indicator": "success_rate_3m", + "objective_type": "SuccessRateObjective", + "operator": ">=", + "threshold": 0.995 + }, + { + "indicator": "latency_3m_p99", + "objective_type": "LatencyObjective", + "operator": "<=", + "threshold": 50 + } + ], + "long_term_objectives": [ + { + "id": "success_rate_28_days", + "objective_type": "SuccessRateObjective", + "operator": ">=", + "threshold": 0.993, + "duration": 28, + "duration_unit": "DAYS" + }, + { + "id": "latency_p99_28_days", + "objective_type": "LatencyObjective", + "operator": "<=", + "threshold": 60, + "duration": 28, + "duration_unit": "DAYS", + "percentile": 0.99 + } + ] + } + ], + "@version": 1 +} diff --git a/representation-scorer/server/src/main/resources/config/decider.yml b/representation-scorer/server/src/main/resources/config/decider.yml new file mode 100644 index 000000000..56ae90418 --- /dev/null +++ b/representation-scorer/server/src/main/resources/config/decider.yml @@ -0,0 +1,155 @@ +enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore: + comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests." + default_availability: 0 + +enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore: + comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145K2020EmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests." + default_availability: 0 + +representation-scorer_forward_dark_traffic: + comment: "Defines the percentage of traffic to forward to diffy-proxy. Set to 0 to disable dark traffic forwarding" + default_availability: 0 + +"representation-scorer_load_shed_non_prod_callers": + comment: "Discard traffic from all non-prod callers" + default_availability: 0 + +enable_log_fav_based_tweet_embedding_20m145k2020_timeouts: + comment: "If enabled, set a timeout on calls to the logFavBased20M145K2020TweetEmbeddingStore" + default_availability: 0 + +log_fav_based_tweet_embedding_20m145k2020_timeout_value_millis: + comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145K2020TweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145k2020_timeouts is true" + default_availability: 2000 + +enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts: + comment: "If enabled, set a timeout on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore" + default_availability: 0 + +log_fav_based_tweet_embedding_20m145kUpdated_timeout_value_millis: + comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts is true" + default_availability: 2000 + +enable_cluster_tweet_index_store_timeouts: + comment: "If enabled, set a timeout on calls to the ClusterTweetIndexStore" + default_availability: 0 + +cluster_tweet_index_store_timeout_value_millis: + comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the ClusterTweetIndexStore, i.e. 1.50% is 150ms. Only applied if enable_cluster_tweet_index_store_timeouts is true" + default_availability: 2000 + +representation_scorer_fetch_signal_share: + comment: "If enabled, fetches share signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_reply: + comment: "If enabled, fetches reply signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_original_tweet: + comment: "If enabled, fetches original tweet signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_video_playback: + comment: "If enabled, fetches video playback signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_block: + comment: "If enabled, fetches account block signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_mute: + comment: "If enabled, fetches account mute signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_report: + comment: "If enabled, fetches tweet report signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_dont_like: + comment: "If enabled, fetches tweet don't like signals from USS" + default_availability: 0 + +representation_scorer_fetch_signal_see_fewer: + comment: "If enabled, fetches tweet see fewer signals from USS" + default_availability: 0 + +# To create a new decider, add here with the same format and caller's details : "representation-scorer_load_shed_by_caller_id_twtr:{{role}}:{{name}}:{{environment}}:{{cluster}}" +# All the deciders below are generated by this script - ./strato/bin/fed deciders ./ --service-role=representation-scorer --service-name=representation-scorer +# If you need to run the script and paste the output, add only the prod deciders here. Non-prod ones are being taken care of by representation-scorer_load_shed_non_prod_callers + +"representation-scorer_load_shed_by_caller_id_all": + comment: "Reject all traffic from caller id: all" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-send:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-send:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:atla": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoapi:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoapi:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:pdxa" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:atla": + comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:atla" + default_availability: 0 + +"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa": + comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa" + default_availability: 0 + +"enable_sim_clusters_embedding_store_timeouts": + comment: "If enabled, set a timeout on calls to the SimClustersEmbeddingStore" + default_availability: 10000 + +sim_clusters_embedding_store_timeout_value_millis: + comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the SimClustersEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_sim_clusters_embedding_store_timeouts is true" + default_availability: 2000 diff --git a/representation-scorer/server/src/main/resources/logback.xml b/representation-scorer/server/src/main/resources/logback.xml new file mode 100644 index 000000000..cf7028151 --- /dev/null +++ b/representation-scorer/server/src/main/resources/logback.xml @@ -0,0 +1,165 @@ + + + + + + + + + + + + + + + + + true + + + + + + + + + + + ${log.service.output} + + + ${log.service.output}.%d.gz + + 3GB + + 21 + true + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + ${log.access.output} + + + ${log.access.output}.%d.gz + + 100MB + + 7 + true + + + ${DEFAULT_ACCESS_PATTERN}%n + + + + + + true + ${log.lens.category} + ${log.lens.index} + ${log.lens.tag}/service + + %msg + + + + + + true + ${log.lens.category} + ${log.lens.index} + ${log.lens.tag}/access + + %msg + + + + + + allow_listed_pipeline_executions.log + + + allow_listed_pipeline_executions.log.%d.gz + + 100MB + + 7 + true + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + + + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/BUILD new file mode 100644 index 000000000..fdb60da54 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/BUILD @@ -0,0 +1,13 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finagle-internal/slo/src/main/scala/com/twitter/finagle/slo", + "finatra/inject/inject-thrift-client", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "twitter-server-internal/src/main/scala", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/RepresentationScorerFedServer.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/RepresentationScorerFedServer.scala new file mode 100644 index 000000000..a0a203311 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/RepresentationScorerFedServer.scala @@ -0,0 +1,38 @@ +package com.twitter.representationscorer + +import com.google.inject.Module +import com.twitter.inject.thrift.modules.ThriftClientIdModule +import com.twitter.representationscorer.columns.ListScoreColumn +import com.twitter.representationscorer.columns.ScoreColumn +import com.twitter.representationscorer.columns.SimClustersRecentEngagementSimilarityColumn +import com.twitter.representationscorer.columns.SimClustersRecentEngagementSimilarityUserTweetEdgeColumn +import com.twitter.representationscorer.modules.CacheModule +import com.twitter.representationscorer.modules.EmbeddingStoreModule +import com.twitter.representationscorer.modules.RMSConfigModule +import com.twitter.representationscorer.modules.TimerModule +import com.twitter.representationscorer.twistlyfeatures.UserSignalServiceRecentEngagementsClientModule +import com.twitter.strato.fed._ +import com.twitter.strato.fed.server._ + +object RepresentationScorerFedServerMain extends RepresentationScorerFedServer + +trait RepresentationScorerFedServer extends StratoFedServer { + override def dest: String = "/s/representation-scorer/representation-scorer" + override val modules: Seq[Module] = + Seq( + CacheModule, + ThriftClientIdModule, + UserSignalServiceRecentEngagementsClientModule, + TimerModule, + RMSConfigModule, + EmbeddingStoreModule + ) + + override def columns: Seq[Class[_ <: StratoFed.Column]] = + Seq( + classOf[ListScoreColumn], + classOf[ScoreColumn], + classOf[SimClustersRecentEngagementSimilarityUserTweetEdgeColumn], + classOf[SimClustersRecentEngagementSimilarityColumn] + ) +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/BUILD new file mode 100644 index 000000000..3352a51b9 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/BUILD @@ -0,0 +1,16 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "content-recommender/thrift/src/main/thrift:thrift-scala", + "finatra/inject/inject-core/src/main/scala", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/common", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures", + "representation-scorer/server/src/main/thrift:thrift-scala", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/Info.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/Info.scala new file mode 100644 index 000000000..3b14a491f --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/Info.scala @@ -0,0 +1,13 @@ +package com.twitter.representationscorer.columns + +import com.twitter.strato.config.{ContactInfo => StratoContactInfo} + +object Info { + val contactInfo: StratoContactInfo = StratoContactInfo( + description = "Please contact Relevance Platform team for more details", + contactEmail = "no-reply@twitter.com", + ldapGroup = "representation-scorer-admins", + jiraProject = "JIRA", + links = Seq("http://go.twitter.biz/rsx-runbook") + ) +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ListScoreColumn.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ListScoreColumn.scala new file mode 100644 index 000000000..04d8b8cb1 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ListScoreColumn.scala @@ -0,0 +1,116 @@ +package com.twitter.representationscorer.columns + +import com.twitter.representationscorer.thriftscala.ListScoreId +import com.twitter.representationscorer.thriftscala.ListScoreResponse +import com.twitter.representationscorer.scorestore.ScoreStore +import com.twitter.representationscorer.thriftscala.ScoreResult +import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongInternalId +import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongSimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.simclusters_v2.thriftscala.ScoreInternalId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingPairScoreId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.util.Future +import com.twitter.util.Return +import com.twitter.util.Throw +import javax.inject.Inject + +class ListScoreColumn @Inject() (scoreStore: ScoreStore) + extends StratoFed.Column("recommendations/representation_scorer/listScore") + with StratoFed.Fetch.Stitch { + + override val policy: Policy = Common.rsxReadPolicy + + override type Key = ListScoreId + override type View = Unit + override type Value = ListScoreResponse + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[ListScoreId] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[ListScoreResponse] + + override val contactInfo: ContactInfo = Info.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText( + "Scoring for multiple candidate entities against a single target entity" + )) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + + val target = SimClustersEmbeddingId( + embeddingType = key.targetEmbeddingType, + modelVersion = key.modelVersion, + internalId = key.targetId + ) + val scoreIds = key.candidateIds.map { candidateId => + val candidate = SimClustersEmbeddingId( + embeddingType = key.candidateEmbeddingType, + modelVersion = key.modelVersion, + internalId = candidateId + ) + ScoreId( + algorithm = key.algorithm, + internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingPairScoreId(target, candidate) + ) + ) + } + + Stitch + .callFuture { + val (keys: Iterable[ScoreId], vals: Iterable[Future[Option[Score]]]) = + scoreStore.uniformScoringStore.multiGet(scoreIds.toSet).unzip + val results: Future[Iterable[Option[Score]]] = Future.collectToTry(vals.toSeq) map { + tryOptVals => + tryOptVals map { + case Return(Some(v)) => Some(v) + case Return(None) => None + case Throw(_) => None + } + } + val scoreMap: Future[Map[Long, Double]] = results.map { scores => + keys + .zip(scores).collect { + case ( + ScoreId( + _, + ScoreInternalId.SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingPairScoreId( + _, + LongSimClustersEmbeddingId(candidateId)))), + Some(score)) => + (candidateId, score.score) + }.toMap + } + scoreMap + } + .map { (scores: Map[Long, Double]) => + val orderedScores = key.candidateIds.collect { + case LongInternalId(id) => ScoreResult(scores.get(id)) + case _ => + // This will return None scores for candidates which don't have Long ids, but that's fine: + // at the moment we're only scoring for Tweets + ScoreResult(None) + } + found(ListScoreResponse(orderedScores)) + } + .handle { + case stitch.NotFound => missing + } + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ScoreColumn.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ScoreColumn.scala new file mode 100644 index 000000000..6b565288b --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/ScoreColumn.scala @@ -0,0 +1,48 @@ +package com.twitter.representationscorer.columns + +import com.twitter.contentrecommender.thriftscala.ScoringResponse +import com.twitter.representationscorer.scorestore.ScoreStore +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class ScoreColumn @Inject() (scoreStore: ScoreStore) + extends StratoFed.Column("recommendations/representation_scorer/score") + with StratoFed.Fetch.Stitch { + + override val policy: Policy = Common.rsxReadPolicy + + override type Key = ScoreId + override type View = Unit + override type Value = ScoringResponse + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[ScoreId] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[ScoringResponse] + + override val contactInfo: ContactInfo = Info.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some(PlainText( + "The Uniform Scoring Endpoint in Representation Scorer for the Content-Recommender." + + " TDD: http://go/representation-scorer-tdd Guideline: http://go/uniform-scoring-guideline")) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = + scoreStore + .uniformScoringStoreStitch(key) + .map(score => found(ScoringResponse(Some(score)))) + .handle { + case stitch.NotFound => missing + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityColumn.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityColumn.scala new file mode 100644 index 000000000..e14a67eae --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityColumn.scala @@ -0,0 +1,52 @@ +package com.twitter.representationscorer.columns + +import com.twitter.representationscorer.common.TweetId +import com.twitter.representationscorer.common.UserId +import com.twitter.representationscorer.thriftscala.RecentEngagementSimilaritiesResponse +import com.twitter.representationscorer.twistlyfeatures.Scorer +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class SimClustersRecentEngagementSimilarityColumn @Inject() (scorer: Scorer) + extends StratoFed.Column( + "recommendations/representation_scorer/simClustersRecentEngagementSimilarity") + with StratoFed.Fetch.Stitch { + + override val policy: Policy = Common.rsxReadPolicy + + override type Key = (UserId, Seq[TweetId]) + override type View = Unit + override type Value = RecentEngagementSimilaritiesResponse + + override val keyConv: Conv[Key] = Conv.ofType[(Long, Seq[Long])] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = + ScroogeConv.fromStruct[RecentEngagementSimilaritiesResponse] + + override val contactInfo: ContactInfo = Info.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText( + "User-Tweet scores based on the user's recent engagements for multiple tweets." + )) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = + scorer + .get(key._1, key._2) + .map(results => found(RecentEngagementSimilaritiesResponse(results))) + .handle { + case stitch.NotFound => missing + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityUserTweetEdgeColumn.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityUserTweetEdgeColumn.scala new file mode 100644 index 000000000..e54d3a71b --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns/SimClustersRecentEngagementSimilarityUserTweetEdgeColumn.scala @@ -0,0 +1,52 @@ +package com.twitter.representationscorer.columns + +import com.twitter.representationscorer.common.TweetId +import com.twitter.representationscorer.common.UserId +import com.twitter.representationscorer.thriftscala.SimClustersRecentEngagementSimilarities +import com.twitter.representationscorer.twistlyfeatures.Scorer +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed._ +import com.twitter.strato.thrift.ScroogeConv +import javax.inject.Inject + +class SimClustersRecentEngagementSimilarityUserTweetEdgeColumn @Inject() (scorer: Scorer) + extends StratoFed.Column( + "recommendations/representation_scorer/simClustersRecentEngagementSimilarity.UserTweetEdge") + with StratoFed.Fetch.Stitch { + + override val policy: Policy = Common.rsxReadPolicy + + override type Key = (UserId, TweetId) + override type View = Unit + override type Value = SimClustersRecentEngagementSimilarities + + override val keyConv: Conv[Key] = Conv.ofType[(Long, Long)] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = + ScroogeConv.fromStruct[SimClustersRecentEngagementSimilarities] + + override val contactInfo: ContactInfo = Info.contactInfo + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some( + PlainText( + "User-Tweet scores based on the user's recent engagements" + )) + ) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = + scorer + .get(key._1, key._2) + .map(found(_)) + .handle { + case stitch.NotFound => missing + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/BUILD new file mode 100644 index 000000000..018cef9eb --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/BUILD @@ -0,0 +1,9 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "decider/src/main/scala", + "src/scala/com/twitter/simclusters_v2/common", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/DeciderConstants.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/DeciderConstants.scala new file mode 100644 index 000000000..838835616 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/DeciderConstants.scala @@ -0,0 +1,7 @@ +package com.twitter.representationscorer + +object DeciderConstants { + val enableSimClustersEmbeddingStoreTimeouts = "enable_sim_clusters_embedding_store_timeouts" + val simClustersEmbeddingStoreTimeoutValueMillis = + "sim_clusters_embedding_store_timeout_value_millis" +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/RepresentationScorerDecider.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/RepresentationScorerDecider.scala new file mode 100644 index 000000000..5aa4b4f2c --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/RepresentationScorerDecider.scala @@ -0,0 +1,27 @@ +package com.twitter.representationscorer.common + +import com.twitter.decider.Decider +import com.twitter.decider.RandomRecipient +import com.twitter.decider.Recipient +import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class RepresentationScorerDecider @Inject() (decider: Decider) { + + val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider) + + def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = { + decider.isAvailable(feature, recipient) + } + + /** + * When useRandomRecipient is set to false, the decider is either completely on or off. + * When useRandomRecipient is set to true, the decider is on for the specified % of traffic. + */ + def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = { + if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient)) + else isAvailable(feature, None) + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/package.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/package.scala new file mode 100644 index 000000000..c5bf9c60a --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/common/package.scala @@ -0,0 +1,6 @@ +package com.twitter.representationscorer + +package object common { + type UserId = Long + type TweetId = Long +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/BUILD new file mode 100644 index 000000000..c73f2a68e --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/BUILD @@ -0,0 +1,19 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication", + "finagle/finagle-stats", + "finatra/inject/inject-core/src/main/scala", + "representation-manager/client/src/main/scala/com/twitter/representation_manager", + "representation-manager/client/src/main/scala/com/twitter/representation_manager/config", + "representation-manager/server/src/main/scala/com/twitter/representation_manager/migration", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/common", + "servo/util", + "src/scala/com/twitter/simclusters_v2/stores", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/util", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/CacheModule.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/CacheModule.scala new file mode 100644 index 000000000..b8b815872 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/CacheModule.scala @@ -0,0 +1,34 @@ +package com.twitter.representationscorer.modules + +import com.google.inject.Provides +import com.twitter.finagle.memcached.Client +import javax.inject.Singleton +import com.twitter.conversions.DurationOps._ +import com.twitter.inject.TwitterModule +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.storehaus_internal.memcache.MemcacheStore +import com.twitter.storehaus_internal.util.ClientName +import com.twitter.storehaus_internal.util.ZkEndPoint + +object CacheModule extends TwitterModule { + + private val cacheDest = flag[String]("cache_module.dest", "Path to memcache service") + private val timeout = flag[Int]("memcache.timeout", "Memcache client timeout") + private val retries = flag[Int]("memcache.retries", "Memcache timeout retries") + + @Singleton + @Provides + def providesCache( + serviceIdentifier: ServiceIdentifier, + stats: StatsReceiver + ): Client = + MemcacheStore.memcachedClient( + name = ClientName("memcache_representation_manager"), + dest = ZkEndPoint(cacheDest()), + timeout = timeout().milliseconds, + retries = retries(), + statsReceiver = stats.scope("cache_client"), + serviceIdentifier = serviceIdentifier + ) +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/EmbeddingStoreModule.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/EmbeddingStoreModule.scala new file mode 100644 index 000000000..bff5d491c --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/EmbeddingStoreModule.scala @@ -0,0 +1,100 @@ +package com.twitter.representationscorer.modules + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.memcached.{Client => MemcachedClient} +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.thrift.ClientId +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.inject.TwitterModule +import com.twitter.relevance_platform.common.readablestore.ReadableStoreWithTimeout +import com.twitter.representation_manager.migration.LegacyRMS +import com.twitter.representationscorer.DeciderConstants +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.storehaus.ReadableStore +import com.twitter.util.Timer +import javax.inject.Singleton + +object EmbeddingStoreModule extends TwitterModule { + @Singleton + @Provides + def providesEmbeddingStore( + memCachedClient: MemcachedClient, + serviceIdentifier: ServiceIdentifier, + clientId: ClientId, + timer: Timer, + decider: Decider, + stats: StatsReceiver + ): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val cacheHashKeyPrefix: String = "RMS" + val embeddingStoreClient = new LegacyRMS( + serviceIdentifier, + memCachedClient, + stats, + decider, + clientId, + timer, + cacheHashKeyPrefix + ) + + val underlyingStores: Map[ + (EmbeddingType, ModelVersion), + ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] + ] = Map( + // Tweet Embeddings + ( + LogFavBasedTweet, + Model20m145k2020) -> embeddingStoreClient.logFavBased20M145K2020TweetEmbeddingStore, + ( + LogFavLongestL2EmbeddingTweet, + Model20m145k2020) -> embeddingStoreClient.logFavBasedLongestL2Tweet20M145K2020EmbeddingStore, + // InterestedIn Embeddings + ( + LogFavBasedUserInterestedInFromAPE, + Model20m145k2020) -> embeddingStoreClient.LogFavBasedInterestedInFromAPE20M145K2020Store, + ( + FavBasedUserInterestedIn, + Model20m145k2020) -> embeddingStoreClient.favBasedUserInterestedIn20M145K2020Store, + // Author Embeddings + ( + FavBasedProducer, + Model20m145k2020) -> embeddingStoreClient.favBasedProducer20M145K2020EmbeddingStore, + // Entity Embeddings + ( + LogFavBasedKgoApeTopic, + Model20m145k2020) -> embeddingStoreClient.logFavBasedApeEntity20M145K2020EmbeddingCachedStore, + (FavTfgTopic, Model20m145k2020) -> embeddingStoreClient.favBasedTfgTopicEmbedding2020Store, + ) + + val simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = { + val underlying: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + SimClustersEmbeddingStore.buildWithDecider( + underlyingStores = underlyingStores, + decider = decider, + statsReceiver = stats.scope("simClusters_embeddings_store_deciderable") + ) + + val underlyingWithTimeout: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = + new ReadableStoreWithTimeout( + rs = underlying, + decider = decider, + enableTimeoutDeciderKey = DeciderConstants.enableSimClustersEmbeddingStoreTimeouts, + timeoutValueKey = DeciderConstants.simClustersEmbeddingStoreTimeoutValueMillis, + timer = timer, + statsReceiver = stats.scope("simClusters_embedding_store_timeouts") + ) + + ObservedReadableStore( + store = underlyingWithTimeout + )(stats.scope("simClusters_embeddings_store")) + } + simClustersEmbeddingStore + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/RMSConfigModule.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/RMSConfigModule.scala new file mode 100644 index 000000000..08ac0cb93 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/RMSConfigModule.scala @@ -0,0 +1,63 @@ +package com.twitter.representationscorer.modules + +import com.google.inject.Provides +import com.twitter.conversions.DurationOps._ +import com.twitter.inject.TwitterModule +import com.twitter.representation_manager.config.ClientConfig +import com.twitter.representation_manager.config.EnabledInMemoryCacheParams +import com.twitter.representation_manager.config.InMemoryCacheParams +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.EmbeddingType._ +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ModelVersion._ +import javax.inject.Singleton + +object RMSConfigModule extends TwitterModule { + def getCacheName(embedingType: EmbeddingType, modelVersion: ModelVersion): String = + s"${embedingType.name}_${modelVersion.name}_in_mem_cache" + + @Singleton + @Provides + def providesRMSClientConfig: ClientConfig = { + val cacheParamsMap: Map[ + (EmbeddingType, ModelVersion), + InMemoryCacheParams + ] = Map( + // Tweet Embeddings + (LogFavBasedTweet, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 10.minutes, + maxKeys = 1048575, // 800MB + cacheName = getCacheName(LogFavBasedTweet, Model20m145k2020)), + (LogFavLongestL2EmbeddingTweet, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 5.minute, + maxKeys = 1048575, // 800MB + cacheName = getCacheName(LogFavLongestL2EmbeddingTweet, Model20m145k2020)), + // User - KnownFor Embeddings + (FavBasedProducer, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 1.day, + maxKeys = 500000, // 400MB + cacheName = getCacheName(FavBasedProducer, Model20m145k2020)), + // User - InterestedIn Embeddings + (LogFavBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 6.hours, + maxKeys = 262143, + cacheName = getCacheName(LogFavBasedUserInterestedInFromAPE, Model20m145k2020)), + (FavBasedUserInterestedIn, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 6.hours, + maxKeys = 262143, + cacheName = getCacheName(FavBasedUserInterestedIn, Model20m145k2020)), + // Topic Embeddings + (FavTfgTopic, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 12.hours, + maxKeys = 262143, // 200MB + cacheName = getCacheName(FavTfgTopic, Model20m145k2020)), + (LogFavBasedKgoApeTopic, Model20m145k2020) -> EnabledInMemoryCacheParams( + ttl = 6.hours, + maxKeys = 262143, + cacheName = getCacheName(LogFavBasedKgoApeTopic, Model20m145k2020)), + ) + + new ClientConfig(inMemCacheParamsOverrides = cacheParamsMap) + } + +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/TimerModule.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/TimerModule.scala new file mode 100644 index 000000000..b425d516a --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules/TimerModule.scala @@ -0,0 +1,13 @@ +package com.twitter.representationscorer.modules + +import com.google.inject.Provides +import com.twitter.finagle.util.DefaultTimer +import com.twitter.inject.TwitterModule +import com.twitter.util.Timer +import javax.inject.Singleton + +object TimerModule extends TwitterModule { + @Singleton + @Provides + def providesTimer: Timer = DefaultTimer +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/BUILD new file mode 100644 index 000000000..3c259cfc4 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/BUILD @@ -0,0 +1,19 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection", + "representation-manager/client/src/main/scala/com/twitter/representation_manager", + "representation-manager/client/src/main/scala/com/twitter/representation_manager/config", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/common", + "src/scala/com/twitter/simclusters_v2/score", + "src/scala/com/twitter/topic_recos/common", + "src/scala/com/twitter/topic_recos/stores", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/topic_recos:topic_recos-thrift-scala", + "stitch/stitch-storehaus", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/ScoreStore.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/ScoreStore.scala new file mode 100644 index 000000000..db7cbefa9 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/ScoreStore.scala @@ -0,0 +1,168 @@ +package com.twitter.representationscorer.scorestore + +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hashing.KeyHasher +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.relevance_platform.common.injection.LZ4Injection +import com.twitter.simclusters_v2.common.SimClustersEmbedding +import com.twitter.simclusters_v2.score.ScoreFacadeStore +import com.twitter.simclusters_v2.score.SimClustersEmbeddingPairScoreStore +import com.twitter.simclusters_v2.thriftscala.EmbeddingType.FavTfgTopic +import com.twitter.simclusters_v2.thriftscala.EmbeddingType.LogFavBasedKgoApeTopic +import com.twitter.simclusters_v2.thriftscala.EmbeddingType.LogFavBasedTweet +import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145kUpdated +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.topic_recos.stores.CertoTweetTopicScoresStore +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton() +class ScoreStore @Inject() ( + simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding], + stratoClient: StratoClient, + representationScorerCacheClient: Client, + stats: StatsReceiver) { + + private val keyHasher = KeyHasher.FNV1A_64 + private val statsReceiver = stats.scope("score_store") + + /** ** Score Store *****/ + private val simClustersEmbeddingCosineSimilarityScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildCosineSimilarityStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_cosine_similarity_score_store")) + + private val simClustersEmbeddingDotProductScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildDotProductStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_dot_product_score_store")) + + private val simClustersEmbeddingJaccardSimilarityScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildJaccardSimilarityStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_jaccard_similarity_score_store")) + + private val simClustersEmbeddingEuclideanDistanceScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildEuclideanDistanceStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_euclidean_distance_score_store")) + + private val simClustersEmbeddingManhattanDistanceScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildManhattanDistanceStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_manhattan_distance_score_store")) + + private val simClustersEmbeddingLogCosineSimilarityScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildLogCosineSimilarityStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_log_cosine_similarity_score_store")) + + private val simClustersEmbeddingExpScaledCosineSimilarityScoreStore = + ObservedReadableStore( + SimClustersEmbeddingPairScoreStore + .buildExpScaledCosineSimilarityStore(simClustersEmbeddingStore) + .toThriftStore + )(statsReceiver.scope("simClusters_embedding_exp_scaled_cosine_similarity_score_store")) + + // Use the default setting + private val topicTweetRankingScoreStore = + TopicTweetRankingScoreStore.buildTopicTweetRankingStore( + FavTfgTopic, + LogFavBasedKgoApeTopic, + LogFavBasedTweet, + Model20m145kUpdated, + consumerEmbeddingMultiplier = 1.0, + producerEmbeddingMultiplier = 1.0 + ) + + private val topicTweetsCortexThresholdStore = TopicTweetsCosineSimilarityAggregateStore( + TopicTweetsCosineSimilarityAggregateStore.DefaultScoreKeys, + statsReceiver.scope("topic_tweets_cortex_threshold_store") + ) + + val topicTweetCertoScoreStore: ObservedCachedReadableStore[ScoreId, Score] = { + val underlyingStore = ObservedReadableStore( + TopicTweetCertoScoreStore(CertoTweetTopicScoresStore.prodStore(stratoClient)) + )(statsReceiver.scope("topic_tweet_certo_score_store")) + + val memcachedStore = ObservedMemcachedReadableStore + .fromCacheClient( + backingStore = underlyingStore, + cacheClient = representationScorerCacheClient, + ttl = 10.minutes + )( + valueInjection = LZ4Injection.compose(BinaryScalaCodec(Score)), + statsReceiver = statsReceiver.scope("topic_tweet_certo_store_memcache"), + keyToString = { k: ScoreId => + s"certocs:${keyHasher.hashKey(k.toString.getBytes)}" + } + ) + + ObservedCachedReadableStore.from[ScoreId, Score]( + memcachedStore, + ttl = 5.minutes, + maxKeys = 1000000, + cacheName = "topic_tweet_certo_store_cache", + windowSize = 10000L + )(statsReceiver.scope("topic_tweet_certo_store_cache")) + } + + val uniformScoringStore: ReadableStore[ScoreId, Score] = + ScoreFacadeStore.buildWithMetrics( + readableStores = Map( + ScoringAlgorithm.PairEmbeddingCosineSimilarity -> + simClustersEmbeddingCosineSimilarityScoreStore, + ScoringAlgorithm.PairEmbeddingDotProduct -> + simClustersEmbeddingDotProductScoreStore, + ScoringAlgorithm.PairEmbeddingJaccardSimilarity -> + simClustersEmbeddingJaccardSimilarityScoreStore, + ScoringAlgorithm.PairEmbeddingEuclideanDistance -> + simClustersEmbeddingEuclideanDistanceScoreStore, + ScoringAlgorithm.PairEmbeddingManhattanDistance -> + simClustersEmbeddingManhattanDistanceScoreStore, + ScoringAlgorithm.PairEmbeddingLogCosineSimilarity -> + simClustersEmbeddingLogCosineSimilarityScoreStore, + ScoringAlgorithm.PairEmbeddingExpScaledCosineSimilarity -> + simClustersEmbeddingExpScaledCosineSimilarityScoreStore, + // Certo normalized cosine score between topic-tweet pairs + ScoringAlgorithm.CertoNormalizedCosineScore + -> topicTweetCertoScoreStore, + // Certo normalized dot-product score between topic-tweet pairs + ScoringAlgorithm.CertoNormalizedDotProductScore + -> topicTweetCertoScoreStore + ), + aggregatedStores = Map( + ScoringAlgorithm.WeightedSumTopicTweetRanking -> + topicTweetRankingScoreStore, + ScoringAlgorithm.CortexTopicTweetLabel -> + topicTweetsCortexThresholdStore, + ), + statsReceiver = stats + ) + + val uniformScoringStoreStitch: ScoreId => com.twitter.stitch.Stitch[Score] = + StitchOfReadableStore(uniformScoringStore) +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetCertoScoreStore.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetCertoScoreStore.scala new file mode 100644 index 000000000..b6216985f --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetCertoScoreStore.scala @@ -0,0 +1,106 @@ +package com.twitter.representationscorer.scorestore + +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.thriftscala.ScoreInternalId.GenericPairScoreId +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CertoNormalizedDotProductScore +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CertoNormalizedCosineScore +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.simclusters_v2.thriftscala.{Score => ThriftScore} +import com.twitter.simclusters_v2.thriftscala.{ScoreId => ThriftScoreId} +import com.twitter.storehaus.FutureOps +import com.twitter.storehaus.ReadableStore +import com.twitter.topic_recos.thriftscala.Scores +import com.twitter.topic_recos.thriftscala.TopicToScores +import com.twitter.util.Future + +/** + * Score store to get Certo scores. + * Currently, the store supports two Scoring Algorithms (i.e., two types of Certo scores): + * 1. NormalizedDotProduct + * 2. NormalizedCosine + * Querying with corresponding scoring algorithms results in different Certo scores. + */ +case class TopicTweetCertoScoreStore(certoStratoStore: ReadableStore[TweetId, TopicToScores]) + extends ReadableStore[ThriftScoreId, ThriftScore] { + + override def multiGet[K1 <: ThriftScoreId](ks: Set[K1]): Map[K1, Future[Option[ThriftScore]]] = { + val tweetIds = + ks.map(_.internalId).collect { + case GenericPairScoreId(scoreId) => + ((scoreId.id1, scoreId.id2): @annotation.nowarn( + "msg=may not be exhaustive|max recursion depth")) match { + case (InternalId.TweetId(tweetId), _) => tweetId + case (_, InternalId.TweetId(tweetId)) => tweetId + } + } + + val result = for { + certoScores <- Future.collect(certoStratoStore.multiGet(tweetIds)) + } yield { + ks.map { k => + (k.algorithm, k.internalId) match { + case (CertoNormalizedDotProductScore, GenericPairScoreId(scoreId)) => + (scoreId.id1, scoreId.id2) match { + case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) => + ( + k, + extractScore( + tweetId, + topicId, + certoScores, + _.followerL2NormalizedDotProduct8HrHalfLife)) + case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) => + ( + k, + extractScore( + tweetId, + topicId, + certoScores, + _.followerL2NormalizedDotProduct8HrHalfLife)) + case _ => (k, None) + } + case (CertoNormalizedCosineScore, GenericPairScoreId(scoreId)) => + (scoreId.id1, scoreId.id2) match { + case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) => + ( + k, + extractScore( + tweetId, + topicId, + certoScores, + _.followerL2NormalizedCosineSimilarity8HrHalfLife)) + case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) => + ( + k, + extractScore( + tweetId, + topicId, + certoScores, + _.followerL2NormalizedCosineSimilarity8HrHalfLife)) + case _ => (k, None) + } + case _ => (k, None) + } + }.toMap + } + FutureOps.liftValues(ks, result) + } + + /** + * Given tweetToCertoScores, extract certain Certo score between the given tweetId and topicId. + * The Certo score of interest is specified using scoreExtractor. + */ + def extractScore( + tweetId: TweetId, + topicId: TopicId, + tweetToCertoScores: Map[TweetId, Option[TopicToScores]], + scoreExtractor: Scores => Double + ): Option[ThriftScore] = { + tweetToCertoScores.get(tweetId).flatMap { + case Some(topicToScores) => + topicToScores.topicToScores.flatMap(_.get(topicId).map(scoreExtractor).map(ThriftScore(_))) + case _ => Some(ThriftScore(0.0)) + } + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetRankingScoreStore.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetRankingScoreStore.scala new file mode 100644 index 000000000..9ff502fd6 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetRankingScoreStore.scala @@ -0,0 +1,48 @@ +package com.twitter.representationscorer.scorestore + +import com.twitter.simclusters_v2.score.WeightedSumAggregatedScoreStore +import com.twitter.simclusters_v2.score.WeightedSumAggregatedScoreStore.WeightedSumAggregatedScoreParameter +import com.twitter.simclusters_v2.thriftscala.{EmbeddingType, ModelVersion, ScoringAlgorithm} + +object TopicTweetRankingScoreStore { + val producerEmbeddingScoreMultiplier = 1.0 + val consumerEmbeddingScoreMultiplier = 1.0 + + /** + * Build the scoring store for TopicTweet Ranking based on Default Multipliers. + * If you want to compare the ranking between different multipliers, register a new + * ScoringAlgorithm and let the upstream uses different scoringAlgorithm by params. + */ + def buildTopicTweetRankingStore( + consumerEmbeddingType: EmbeddingType, + producerEmbeddingType: EmbeddingType, + tweetEmbeddingType: EmbeddingType, + modelVersion: ModelVersion, + consumerEmbeddingMultiplier: Double = consumerEmbeddingScoreMultiplier, + producerEmbeddingMultiplier: Double = producerEmbeddingScoreMultiplier + ): WeightedSumAggregatedScoreStore = { + WeightedSumAggregatedScoreStore( + List( + WeightedSumAggregatedScoreParameter( + ScoringAlgorithm.PairEmbeddingCosineSimilarity, + consumerEmbeddingMultiplier, + WeightedSumAggregatedScoreStore.genericPairScoreIdToSimClustersEmbeddingPairScoreId( + consumerEmbeddingType, + tweetEmbeddingType, + modelVersion + ) + ), + WeightedSumAggregatedScoreParameter( + ScoringAlgorithm.PairEmbeddingCosineSimilarity, + producerEmbeddingMultiplier, + WeightedSumAggregatedScoreStore.genericPairScoreIdToSimClustersEmbeddingPairScoreId( + producerEmbeddingType, + tweetEmbeddingType, + modelVersion + ) + ) + ) + ) + } + +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetsCosineSimilarityAggregateStore.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetsCosineSimilarityAggregateStore.scala new file mode 100644 index 000000000..f835158b8 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore/TopicTweetsCosineSimilarityAggregateStore.scala @@ -0,0 +1,148 @@ +package com.twitter.representationscorer.scorestore + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.util.StatsUtil +import com.twitter.representationscorer.scorestore.TopicTweetsCosineSimilarityAggregateStore.ScoreKey +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.score.AggregatedScoreStore +import com.twitter.simclusters_v2.thriftscala.ScoreInternalId.GenericPairScoreId +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CortexTopicTweetLabel +import com.twitter.simclusters_v2.thriftscala.{ + EmbeddingType, + InternalId, + ModelVersion, + ScoreInternalId, + ScoringAlgorithm, + SimClustersEmbeddingId, + TopicId, + Score => ThriftScore, + ScoreId => ThriftScoreId, + SimClustersEmbeddingPairScoreId => ThriftSimClustersEmbeddingPairScoreId +} +import com.twitter.storehaus.ReadableStore +import com.twitter.topic_recos.common.Configs.{DefaultModelVersion, MinCosineSimilarityScore} +import com.twitter.topic_recos.common._ +import com.twitter.util.Future + +/** + * Calculates the cosine similarity scores of arbitrary combinations of TopicEmbeddings and + * TweetEmbeddings. + * The class has 2 uses: + * 1. For internal uses. TSP will call this store to fetch the raw scores for (topic, tweet) with + * all available embedding types. We calculate all the scores here, so the caller can do filtering + * & score caching on their side. This will make it possible to DDG different embedding scores. + * + * 2. For external calls from Cortex. We return true (or 1.0) for any given (topic, tweet) if their + * cosine similarity passes the threshold for any of the embedding types. + * The expected input type is + * ScoreId( + * PairEmbeddingCosineSimilarity, + * GenericPairScoreId(TopicId, TweetId) + * ) + */ +case class TopicTweetsCosineSimilarityAggregateStore( + scoreKeys: Seq[ScoreKey], + statsReceiver: StatsReceiver) + extends AggregatedScoreStore { + + def toCortexScore(scoresMap: Map[ScoreKey, Double]): Double = { + val passThreshold = scoresMap.exists { + case (_, score) => score >= MinCosineSimilarityScore + } + if (passThreshold) 1.0 else 0.0 + } + + /** + * To be called by Cortex through Unified Score API ONLY. Calculates all possible (topic, tweet), + * return 1.0 if any of the embedding scores passes the minimum threshold. + * + * Expect a GenericPairScoreId(PairEmbeddingCosineSimilarity, (TopicId, TweetId)) as input + */ + override def get(k: ThriftScoreId): Future[Option[ThriftScore]] = { + StatsUtil.trackOptionStats(statsReceiver) { + (k.algorithm, k.internalId) match { + case (CortexTopicTweetLabel, GenericPairScoreId(genericPairScoreId)) => + (genericPairScoreId.id1, genericPairScoreId.id2) match { + case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) => + TopicTweetsCosineSimilarityAggregateStore + .getRawScoresMap(topicId, tweetId, scoreKeys, scoreFacadeStore) + .map { scoresMap => Some(ThriftScore(toCortexScore(scoresMap))) } + case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) => + TopicTweetsCosineSimilarityAggregateStore + .getRawScoresMap(topicId, tweetId, scoreKeys, scoreFacadeStore) + .map { scoresMap => Some(ThriftScore(toCortexScore(scoresMap))) } + case _ => + Future.None + // Do not accept other InternalId combinations + } + case _ => + // Do not accept other Id types for now + Future.None + } + } + } +} + +object TopicTweetsCosineSimilarityAggregateStore { + + val TopicEmbeddingTypes: Seq[EmbeddingType] = + Seq( + EmbeddingType.FavTfgTopic, + EmbeddingType.LogFavBasedKgoApeTopic + ) + + // Add the new embedding types if want to test the new Tweet embedding performance. + val TweetEmbeddingTypes: Seq[EmbeddingType] = Seq(EmbeddingType.LogFavBasedTweet) + + val ModelVersions: Seq[ModelVersion] = + Seq(DefaultModelVersion) + + val DefaultScoreKeys: Seq[ScoreKey] = { + for { + modelVersion <- ModelVersions + topicEmbeddingType <- TopicEmbeddingTypes + tweetEmbeddingType <- TweetEmbeddingTypes + } yield { + ScoreKey( + topicEmbeddingType = topicEmbeddingType, + tweetEmbeddingType = tweetEmbeddingType, + modelVersion = modelVersion + ) + } + } + case class ScoreKey( + topicEmbeddingType: EmbeddingType, + tweetEmbeddingType: EmbeddingType, + modelVersion: ModelVersion) + + def getRawScoresMap( + topicId: TopicId, + tweetId: TweetId, + scoreKeys: Seq[ScoreKey], + uniformScoringStore: ReadableStore[ThriftScoreId, ThriftScore] + ): Future[Map[ScoreKey, Double]] = { + val scoresMapFut = scoreKeys.map { key => + val scoreInternalId = ScoreInternalId.SimClustersEmbeddingPairScoreId( + ThriftSimClustersEmbeddingPairScoreId( + buildTopicEmbedding(topicId, key.topicEmbeddingType, key.modelVersion), + SimClustersEmbeddingId( + key.tweetEmbeddingType, + key.modelVersion, + InternalId.TweetId(tweetId)) + )) + val scoreFut = uniformScoringStore + .get( + ThriftScoreId( + algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, // Hard code as cosine sim + internalId = scoreInternalId + )) + key -> scoreFut + }.toMap + + Future + .collect(scoresMapFut).map(_.collect { + case (key, Some(ThriftScore(score))) => + (key, score) + }) + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/BUILD b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/BUILD new file mode 100644 index 000000000..1c617e9a0 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/BUILD @@ -0,0 +1,20 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/github/ben-manes/caffeine", + "finatra/inject/inject-core/src/main/scala", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/common", + "representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore", + "representation-scorer/server/src/main/thrift:thrift-scala", + "src/thrift/com/twitter/twistly:twistly-scala", + "stitch/stitch-core", + "stitch/stitch-core:cache", + "strato/config/columns/recommendations/twistly:twistly-strato-client", + "strato/config/columns/recommendations/user-signal-service:user-signal-service-strato-client", + "strato/src/main/scala/com/twitter/strato/client", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + "util/util-core", + ], +) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Engagements.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Engagements.scala new file mode 100644 index 000000000..2da828ce6 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Engagements.scala @@ -0,0 +1,65 @@ +package com.twitter.representationscorer.twistlyfeatures + +import com.twitter.conversions.DurationOps._ +import com.twitter.util.Duration +import com.twitter.util.Time + +case class Engagements( + favs7d: Seq[UserSignal] = Nil, + retweets7d: Seq[UserSignal] = Nil, + follows30d: Seq[UserSignal] = Nil, + shares7d: Seq[UserSignal] = Nil, + replies7d: Seq[UserSignal] = Nil, + originalTweets7d: Seq[UserSignal] = Nil, + videoPlaybacks7d: Seq[UserSignal] = Nil, + block30d: Seq[UserSignal] = Nil, + mute30d: Seq[UserSignal] = Nil, + report30d: Seq[UserSignal] = Nil, + dontlike30d: Seq[UserSignal] = Nil, + seeFewer30d: Seq[UserSignal] = Nil) { + + import Engagements._ + + private val now = Time.now + private val oneDayAgo = (now - OneDaySpan).inMillis + private val sevenDaysAgo = (now - SevenDaysSpan).inMillis + + // All ids from the signals grouped by type (tweetIds, userIds, etc) + val tweetIds: Seq[Long] = + (favs7d ++ retweets7d ++ shares7d + ++ replies7d ++ originalTweets7d ++ videoPlaybacks7d + ++ report30d ++ dontlike30d ++ seeFewer30d) + .map(_.targetId) + val authorIds: Seq[Long] = (follows30d ++ block30d ++ mute30d).map(_.targetId) + + // Tweet signals + val dontlike7d: Seq[UserSignal] = dontlike30d.filter(_.timestamp > sevenDaysAgo) + val seeFewer7d: Seq[UserSignal] = seeFewer30d.filter(_.timestamp > sevenDaysAgo) + + val favs1d: Seq[UserSignal] = favs7d.filter(_.timestamp > oneDayAgo) + val retweets1d: Seq[UserSignal] = retweets7d.filter(_.timestamp > oneDayAgo) + val shares1d: Seq[UserSignal] = shares7d.filter(_.timestamp > oneDayAgo) + val replies1d: Seq[UserSignal] = replies7d.filter(_.timestamp > oneDayAgo) + val originalTweets1d: Seq[UserSignal] = originalTweets7d.filter(_.timestamp > oneDayAgo) + val videoPlaybacks1d: Seq[UserSignal] = videoPlaybacks7d.filter(_.timestamp > oneDayAgo) + val dontlike1d: Seq[UserSignal] = dontlike7d.filter(_.timestamp > oneDayAgo) + val seeFewer1d: Seq[UserSignal] = seeFewer7d.filter(_.timestamp > oneDayAgo) + + // User signals + val follows7d: Seq[UserSignal] = follows30d.filter(_.timestamp > sevenDaysAgo) + val block7d: Seq[UserSignal] = block30d.filter(_.timestamp > sevenDaysAgo) + val mute7d: Seq[UserSignal] = mute30d.filter(_.timestamp > sevenDaysAgo) + val report7d: Seq[UserSignal] = report30d.filter(_.timestamp > sevenDaysAgo) + + val block1d: Seq[UserSignal] = block7d.filter(_.timestamp > oneDayAgo) + val mute1d: Seq[UserSignal] = mute7d.filter(_.timestamp > oneDayAgo) + val report1d: Seq[UserSignal] = report7d.filter(_.timestamp > oneDayAgo) +} + +object Engagements { + val OneDaySpan: Duration = 1.days + val SevenDaysSpan: Duration = 7.days + val ThirtyDaysSpan: Duration = 30.days +} + +case class UserSignal(targetId: Long, timestamp: Long) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/ScoreResult.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/ScoreResult.scala new file mode 100644 index 000000000..71df34a19 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/ScoreResult.scala @@ -0,0 +1,3 @@ +package com.twitter.representationscorer.twistlyfeatures + +case class ScoreResult(id: Long, score: Option[Double]) diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Scorer.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Scorer.scala new file mode 100644 index 000000000..731412d0a --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/Scorer.scala @@ -0,0 +1,474 @@ +package com.twitter.representationscorer.twistlyfeatures + +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.representationscorer.common.TweetId +import com.twitter.representationscorer.common.UserId +import com.twitter.representationscorer.scorestore.ScoreStore +import com.twitter.representationscorer.thriftscala.SimClustersRecentEngagementSimilarities +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.simclusters_v2.thriftscala.ScoreInternalId +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingPairScoreId +import com.twitter.stitch.Stitch +import javax.inject.Inject + +class Scorer @Inject() ( + fetchEngagementsFromUSS: Long => Stitch[Engagements], + scoreStore: ScoreStore, + stats: StatsReceiver) { + + import Scorer._ + + private val scoreStats = stats.scope("score") + private val scoreCalculationStats = scoreStats.scope("calculation") + private val scoreResultStats = scoreStats.scope("result") + + private val scoresNonEmptyCounter = scoreResultStats.scope("all").counter("nonEmpty") + private val scoresNonZeroCounter = scoreResultStats.scope("all").counter("nonZero") + + private val tweetScoreStats = scoreCalculationStats.scope("tweetScore").stat("latency") + private val userScoreStats = scoreCalculationStats.scope("userScore").stat("latency") + + private val favNonZero = scoreResultStats.scope("favs").counter("nonZero") + private val favNonEmpty = scoreResultStats.scope("favs").counter("nonEmpty") + + private val retweetsNonZero = scoreResultStats.scope("retweets").counter("nonZero") + private val retweetsNonEmpty = scoreResultStats.scope("retweets").counter("nonEmpty") + + private val followsNonZero = scoreResultStats.scope("follows").counter("nonZero") + private val followsNonEmpty = scoreResultStats.scope("follows").counter("nonEmpty") + + private val sharesNonZero = scoreResultStats.scope("shares").counter("nonZero") + private val sharesNonEmpty = scoreResultStats.scope("shares").counter("nonEmpty") + + private val repliesNonZero = scoreResultStats.scope("replies").counter("nonZero") + private val repliesNonEmpty = scoreResultStats.scope("replies").counter("nonEmpty") + + private val originalTweetsNonZero = scoreResultStats.scope("originalTweets").counter("nonZero") + private val originalTweetsNonEmpty = scoreResultStats.scope("originalTweets").counter("nonEmpty") + + private val videoViewsNonZero = scoreResultStats.scope("videoViews").counter("nonZero") + private val videoViewsNonEmpty = scoreResultStats.scope("videoViews").counter("nonEmpty") + + private val blockNonZero = scoreResultStats.scope("block").counter("nonZero") + private val blockNonEmpty = scoreResultStats.scope("block").counter("nonEmpty") + + private val muteNonZero = scoreResultStats.scope("mute").counter("nonZero") + private val muteNonEmpty = scoreResultStats.scope("mute").counter("nonEmpty") + + private val reportNonZero = scoreResultStats.scope("report").counter("nonZero") + private val reportNonEmpty = scoreResultStats.scope("report").counter("nonEmpty") + + private val dontlikeNonZero = scoreResultStats.scope("dontlike").counter("nonZero") + private val dontlikeNonEmpty = scoreResultStats.scope("dontlike").counter("nonEmpty") + + private val seeFewerNonZero = scoreResultStats.scope("seeFewer").counter("nonZero") + private val seeFewerNonEmpty = scoreResultStats.scope("seeFewer").counter("nonEmpty") + + private def getTweetScores( + candidateTweetId: TweetId, + sourceTweetIds: Seq[TweetId] + ): Stitch[Seq[ScoreResult]] = { + val getScoresStitch = Stitch.traverse(sourceTweetIds) { sourceTweetId => + scoreStore + .uniformScoringStoreStitch(getTweetScoreId(sourceTweetId, candidateTweetId)) + .liftNotFoundToOption + .map(score => ScoreResult(sourceTweetId, score.map(_.score))) + } + + Stitch.time(getScoresStitch).flatMap { + case (tryResult, duration) => + tweetScoreStats.add(duration.inMillis) + Stitch.const(tryResult) + } + } + + private def getUserScores( + tweetId: TweetId, + authorIds: Seq[UserId] + ): Stitch[Seq[ScoreResult]] = { + val getScoresStitch = Stitch.traverse(authorIds) { authorId => + scoreStore + .uniformScoringStoreStitch(getAuthorScoreId(authorId, tweetId)) + .liftNotFoundToOption + .map(score => ScoreResult(authorId, score.map(_.score))) + } + + Stitch.time(getScoresStitch).flatMap { + case (tryResult, duration) => + userScoreStats.add(duration.inMillis) + Stitch.const(tryResult) + } + } + + /** + * Get the [[SimClustersRecentEngagementSimilarities]] result containing the similarity + * features for the given userId-TweetId. + */ + def get( + userId: UserId, + tweetId: TweetId + ): Stitch[SimClustersRecentEngagementSimilarities] = { + get(userId, Seq(tweetId)).map(x => x.head) + } + + /** + * Get a list of [[SimClustersRecentEngagementSimilarities]] results containing the similarity + * features for the given tweets of the user Id. + * Guaranteed to be the same number/order as requested. + */ + def get( + userId: UserId, + tweetIds: Seq[TweetId] + ): Stitch[Seq[SimClustersRecentEngagementSimilarities]] = { + fetchEngagementsFromUSS(userId) + .flatMap(engagements => { + // For each tweet received in the request, compute the similarity scores between them + // and the user signals fetched from USS. + Stitch + .join( + Stitch.traverse(tweetIds)(id => getTweetScores(id, engagements.tweetIds)), + Stitch.traverse(tweetIds)(id => getUserScores(id, engagements.authorIds)), + ) + .map { + case (tweetScoresSeq, userScoreSeq) => + // All seq have = size because when scores don't exist, they are returned as Option + (tweetScoresSeq, userScoreSeq).zipped.map { (tweetScores, userScores) => + computeSimilarityScoresPerTweet( + engagements, + tweetScores.groupBy(_.id), + userScores.groupBy(_.id)) + } + } + }) + } + + /** + * + * Computes the [[SimClustersRecentEngagementSimilarities]] + * using the given tweet-tweet and user-tweet scores in TweetScoresMap + * and the user signals in [[Engagements]]. + */ + private def computeSimilarityScoresPerTweet( + engagements: Engagements, + tweetScores: Map[TweetId, Seq[ScoreResult]], + authorScores: Map[UserId, Seq[ScoreResult]] + ): SimClustersRecentEngagementSimilarities = { + val favs7d = engagements.favs7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val favs1d = engagements.favs1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val retweets7d = engagements.retweets7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val retweets1d = engagements.retweets1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val follows30d = engagements.follows30d.view + .flatMap(s => authorScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val follows7d = engagements.follows7d.view + .flatMap(s => authorScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val shares7d = engagements.shares7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val shares1d = engagements.shares1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val replies7d = engagements.replies7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val replies1d = engagements.replies1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val originalTweets7d = engagements.originalTweets7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val originalTweets1d = engagements.originalTweets1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val videoViews7d = engagements.videoPlaybacks7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val videoViews1d = engagements.videoPlaybacks1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val block30d = engagements.block30d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val block7d = engagements.block7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val block1d = engagements.block1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val mute30d = engagements.mute30d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val mute7d = engagements.mute7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val mute1d = engagements.mute1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val report30d = engagements.report30d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val report7d = engagements.report7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val report1d = engagements.report1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val dontlike30d = engagements.dontlike30d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val dontlike7d = engagements.dontlike7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val dontlike1d = engagements.dontlike1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val seeFewer30d = engagements.seeFewer30d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val seeFewer7d = engagements.seeFewer7d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val seeFewer1d = engagements.seeFewer1d.view + .flatMap(s => tweetScores.get(s.targetId)) + .flatten.flatMap(_.score) + .force + + val result = SimClustersRecentEngagementSimilarities( + fav1dLast10Max = max(favs1d), + fav1dLast10Avg = avg(favs1d), + fav7dLast10Max = max(favs7d), + fav7dLast10Avg = avg(favs7d), + retweet1dLast10Max = max(retweets1d), + retweet1dLast10Avg = avg(retweets1d), + retweet7dLast10Max = max(retweets7d), + retweet7dLast10Avg = avg(retweets7d), + follow7dLast10Max = max(follows7d), + follow7dLast10Avg = avg(follows7d), + follow30dLast10Max = max(follows30d), + follow30dLast10Avg = avg(follows30d), + share1dLast10Max = max(shares1d), + share1dLast10Avg = avg(shares1d), + share7dLast10Max = max(shares7d), + share7dLast10Avg = avg(shares7d), + reply1dLast10Max = max(replies1d), + reply1dLast10Avg = avg(replies1d), + reply7dLast10Max = max(replies7d), + reply7dLast10Avg = avg(replies7d), + originalTweet1dLast10Max = max(originalTweets1d), + originalTweet1dLast10Avg = avg(originalTweets1d), + originalTweet7dLast10Max = max(originalTweets7d), + originalTweet7dLast10Avg = avg(originalTweets7d), + videoPlayback1dLast10Max = max(videoViews1d), + videoPlayback1dLast10Avg = avg(videoViews1d), + videoPlayback7dLast10Max = max(videoViews7d), + videoPlayback7dLast10Avg = avg(videoViews7d), + block1dLast10Max = max(block1d), + block1dLast10Avg = avg(block1d), + block7dLast10Max = max(block7d), + block7dLast10Avg = avg(block7d), + block30dLast10Max = max(block30d), + block30dLast10Avg = avg(block30d), + mute1dLast10Max = max(mute1d), + mute1dLast10Avg = avg(mute1d), + mute7dLast10Max = max(mute7d), + mute7dLast10Avg = avg(mute7d), + mute30dLast10Max = max(mute30d), + mute30dLast10Avg = avg(mute30d), + report1dLast10Max = max(report1d), + report1dLast10Avg = avg(report1d), + report7dLast10Max = max(report7d), + report7dLast10Avg = avg(report7d), + report30dLast10Max = max(report30d), + report30dLast10Avg = avg(report30d), + dontlike1dLast10Max = max(dontlike1d), + dontlike1dLast10Avg = avg(dontlike1d), + dontlike7dLast10Max = max(dontlike7d), + dontlike7dLast10Avg = avg(dontlike7d), + dontlike30dLast10Max = max(dontlike30d), + dontlike30dLast10Avg = avg(dontlike30d), + seeFewer1dLast10Max = max(seeFewer1d), + seeFewer1dLast10Avg = avg(seeFewer1d), + seeFewer7dLast10Max = max(seeFewer7d), + seeFewer7dLast10Avg = avg(seeFewer7d), + seeFewer30dLast10Max = max(seeFewer30d), + seeFewer30dLast10Avg = avg(seeFewer30d), + ) + trackStats(result) + result + } + + private def trackStats(result: SimClustersRecentEngagementSimilarities): Unit = { + val scores = Seq( + result.fav7dLast10Max, + result.retweet7dLast10Max, + result.follow30dLast10Max, + result.share1dLast10Max, + result.share7dLast10Max, + result.reply7dLast10Max, + result.originalTweet7dLast10Max, + result.videoPlayback7dLast10Max, + result.block30dLast10Max, + result.mute30dLast10Max, + result.report30dLast10Max, + result.dontlike30dLast10Max, + result.seeFewer30dLast10Max + ) + + val nonEmpty = scores.exists(_.isDefined) + val nonZero = scores.exists { case Some(score) if score > 0 => true; case _ => false } + + if (nonEmpty) { + scoresNonEmptyCounter.incr() + } + + if (nonZero) { + scoresNonZeroCounter.incr() + } + + // We use the largest window of a given type of score, + // because the largest window is inclusive of smaller windows. + trackSignalStats(favNonEmpty, favNonZero, result.fav7dLast10Avg) + trackSignalStats(retweetsNonEmpty, retweetsNonZero, result.retweet7dLast10Avg) + trackSignalStats(followsNonEmpty, followsNonZero, result.follow30dLast10Avg) + trackSignalStats(sharesNonEmpty, sharesNonZero, result.share7dLast10Avg) + trackSignalStats(repliesNonEmpty, repliesNonZero, result.reply7dLast10Avg) + trackSignalStats(originalTweetsNonEmpty, originalTweetsNonZero, result.originalTweet7dLast10Avg) + trackSignalStats(videoViewsNonEmpty, videoViewsNonZero, result.videoPlayback7dLast10Avg) + trackSignalStats(blockNonEmpty, blockNonZero, result.block30dLast10Avg) + trackSignalStats(muteNonEmpty, muteNonZero, result.mute30dLast10Avg) + trackSignalStats(reportNonEmpty, reportNonZero, result.report30dLast10Avg) + trackSignalStats(dontlikeNonEmpty, dontlikeNonZero, result.dontlike30dLast10Avg) + trackSignalStats(seeFewerNonEmpty, seeFewerNonZero, result.seeFewer30dLast10Avg) + } + + private def trackSignalStats(nonEmpty: Counter, nonZero: Counter, score: Option[Double]): Unit = { + if (score.nonEmpty) { + nonEmpty.incr() + + if (score.get > 0) + nonZero.incr() + } + } +} + +object Scorer { + def avg(s: Traversable[Double]): Option[Double] = + if (s.isEmpty) None else Some(s.sum / s.size) + def max(s: Traversable[Double]): Option[Double] = + if (s.isEmpty) None else Some(s.foldLeft(0.0D) { (curr, _max) => math.max(curr, _max) }) + + private def getAuthorScoreId( + userId: UserId, + tweetId: TweetId + ) = { + ScoreId( + algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, + internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingId( + internalId = InternalId.UserId(userId), + modelVersion = ModelVersion.Model20m145k2020, + embeddingType = EmbeddingType.FavBasedProducer + ), + SimClustersEmbeddingId( + internalId = InternalId.TweetId(tweetId), + modelVersion = ModelVersion.Model20m145k2020, + embeddingType = EmbeddingType.LogFavBasedTweet + ) + )) + ) + } + + private def getTweetScoreId( + sourceTweetId: TweetId, + candidateTweetId: TweetId + ) = { + ScoreId( + algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, + internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingPairScoreId( + SimClustersEmbeddingId( + internalId = InternalId.TweetId(sourceTweetId), + modelVersion = ModelVersion.Model20m145k2020, + embeddingType = EmbeddingType.LogFavLongestL2EmbeddingTweet + ), + SimClustersEmbeddingId( + internalId = InternalId.TweetId(candidateTweetId), + modelVersion = ModelVersion.Model20m145k2020, + embeddingType = EmbeddingType.LogFavBasedTweet + ) + )) + ) + } +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClient.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClient.scala new file mode 100644 index 000000000..fb09c1e57 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClient.scala @@ -0,0 +1,155 @@ +package com.twitter.representationscorer.twistlyfeatures + +import com.twitter.decider.SimpleRecipient +import com.twitter.finagle.stats.Stat +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.representationscorer.common._ +import com.twitter.representationscorer.twistlyfeatures.Engagements._ +import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongInternalId +import com.twitter.stitch.Stitch +import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn +import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn.Value +import com.twitter.usersignalservice.thriftscala.BatchSignalRequest +import com.twitter.usersignalservice.thriftscala.SignalRequest +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Time +import scala.collection.mutable.ArrayBuffer +import com.twitter.usersignalservice.thriftscala.ClientIdentifier + +class UserSignalServiceRecentEngagementsClient( + stratoClient: SignalsClientColumn, + decider: RepresentationScorerDecider, + stats: StatsReceiver) { + + import UserSignalServiceRecentEngagementsClient._ + + private val signalStats = stats.scope("user-signal-service", "signal") + private val signalTypeStats: Map[SignalType, Stat] = + SignalType.list.map(s => (s, signalStats.scope(s.name).stat("size"))).toMap + + def get(userId: UserId): Stitch[Engagements] = { + val request = buildRequest(userId) + stratoClient.fetcher.fetch(request).map(_.v).lowerFromOption().map { response => + val now = Time.now + val sevenDaysAgo = now - SevenDaysSpan + val thirtyDaysAgo = now - ThirtyDaysSpan + + Engagements( + favs7d = getUserSignals(response, SignalType.TweetFavorite, sevenDaysAgo), + retweets7d = getUserSignals(response, SignalType.Retweet, sevenDaysAgo), + follows30d = getUserSignals(response, SignalType.AccountFollowWithDelay, thirtyDaysAgo), + shares7d = getUserSignals(response, SignalType.TweetShareV1, sevenDaysAgo), + replies7d = getUserSignals(response, SignalType.Reply, sevenDaysAgo), + originalTweets7d = getUserSignals(response, SignalType.OriginalTweet, sevenDaysAgo), + videoPlaybacks7d = + getUserSignals(response, SignalType.VideoView90dPlayback50V1, sevenDaysAgo), + block30d = getUserSignals(response, SignalType.AccountBlock, thirtyDaysAgo), + mute30d = getUserSignals(response, SignalType.AccountMute, thirtyDaysAgo), + report30d = getUserSignals(response, SignalType.TweetReport, thirtyDaysAgo), + dontlike30d = getUserSignals(response, SignalType.TweetDontLike, thirtyDaysAgo), + seeFewer30d = getUserSignals(response, SignalType.TweetSeeFewer, thirtyDaysAgo), + ) + } + } + + private def getUserSignals( + response: Value, + signalType: SignalType, + earliestValidTimestamp: Time + ): Seq[UserSignal] = { + val signals = response.signalResponse + .getOrElse(signalType, Seq.empty) + .view + .filter(_.timestamp > earliestValidTimestamp.inMillis) + .map(s => s.targetInternalId.collect { case LongInternalId(id) => (id, s.timestamp) }) + .collect { case Some((id, engagedAt)) => UserSignal(id, engagedAt) } + .take(EngagementsToScore) + .force + + signalTypeStats(signalType).add(signals.size) + signals + } + + private def buildRequest(userId: Long) = { + val recipient = Some(SimpleRecipient(userId)) + + // Signals RSX always fetches + val requestSignals = ArrayBuffer( + SignalRequestFav, + SignalRequestRetweet, + SignalRequestFollow + ) + + // Signals under experimentation. We use individual deciders to disable them if necessary. + // If experiments are successful, they will become permanent. + if (decider.isAvailable(FetchSignalShareDeciderKey, recipient)) + requestSignals.append(SignalRequestShare) + + if (decider.isAvailable(FetchSignalReplyDeciderKey, recipient)) + requestSignals.append(SignalRequestReply) + + if (decider.isAvailable(FetchSignalOriginalTweetDeciderKey, recipient)) + requestSignals.append(SignalRequestOriginalTweet) + + if (decider.isAvailable(FetchSignalVideoPlaybackDeciderKey, recipient)) + requestSignals.append(SignalRequestVideoPlayback) + + if (decider.isAvailable(FetchSignalBlockDeciderKey, recipient)) + requestSignals.append(SignalRequestBlock) + + if (decider.isAvailable(FetchSignalMuteDeciderKey, recipient)) + requestSignals.append(SignalRequestMute) + + if (decider.isAvailable(FetchSignalReportDeciderKey, recipient)) + requestSignals.append(SignalRequestReport) + + if (decider.isAvailable(FetchSignalDontlikeDeciderKey, recipient)) + requestSignals.append(SignalRequestDontlike) + + if (decider.isAvailable(FetchSignalSeeFewerDeciderKey, recipient)) + requestSignals.append(SignalRequestSeeFewer) + + BatchSignalRequest(userId, requestSignals, Some(ClientIdentifier.RepresentationScorerHome)) + } +} + +object UserSignalServiceRecentEngagementsClient { + val FetchSignalShareDeciderKey = "representation_scorer_fetch_signal_share" + val FetchSignalReplyDeciderKey = "representation_scorer_fetch_signal_reply" + val FetchSignalOriginalTweetDeciderKey = "representation_scorer_fetch_signal_original_tweet" + val FetchSignalVideoPlaybackDeciderKey = "representation_scorer_fetch_signal_video_playback" + val FetchSignalBlockDeciderKey = "representation_scorer_fetch_signal_block" + val FetchSignalMuteDeciderKey = "representation_scorer_fetch_signal_mute" + val FetchSignalReportDeciderKey = "representation_scorer_fetch_signal_report" + val FetchSignalDontlikeDeciderKey = "representation_scorer_fetch_signal_dont_like" + val FetchSignalSeeFewerDeciderKey = "representation_scorer_fetch_signal_see_fewer" + + val EngagementsToScore = 10 + private val engagementsToScoreOpt: Option[Long] = Some(EngagementsToScore) + + val SignalRequestFav: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.TweetFavorite) + val SignalRequestRetweet: SignalRequest = SignalRequest(engagementsToScoreOpt, SignalType.Retweet) + val SignalRequestFollow: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.AccountFollowWithDelay) + // New experimental signals + val SignalRequestShare: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.TweetShareV1) + val SignalRequestReply: SignalRequest = SignalRequest(engagementsToScoreOpt, SignalType.Reply) + val SignalRequestOriginalTweet: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.OriginalTweet) + val SignalRequestVideoPlayback: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.VideoView90dPlayback50V1) + + // Negative signals + val SignalRequestBlock: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.AccountBlock) + val SignalRequestMute: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.AccountMute) + val SignalRequestReport: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.TweetReport) + val SignalRequestDontlike: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.TweetDontLike) + val SignalRequestSeeFewer: SignalRequest = + SignalRequest(engagementsToScoreOpt, SignalType.TweetSeeFewer) +} diff --git a/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClientModule.scala b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClientModule.scala new file mode 100644 index 000000000..ee9f61df4 --- /dev/null +++ b/representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures/UserSignalServiceRecentEngagementsClientModule.scala @@ -0,0 +1,57 @@ +package com.twitter.representationscorer.twistlyfeatures + +import com.github.benmanes.caffeine.cache.Caffeine +import com.twitter.stitch.cache.EvictingCache +import com.google.inject.Provides +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.representationscorer.common.RepresentationScorerDecider +import com.twitter.stitch.Stitch +import com.twitter.stitch.cache.ConcurrentMapCache +import com.twitter.stitch.cache.MemoizeQuery +import com.twitter.strato.client.Client +import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn +import java.util.concurrent.ConcurrentMap +import java.util.concurrent.TimeUnit +import javax.inject.Singleton + +object UserSignalServiceRecentEngagementsClientModule extends TwitterModule { + + @Singleton + @Provides + def provide( + client: Client, + decider: RepresentationScorerDecider, + statsReceiver: StatsReceiver + ): Long => Stitch[Engagements] = { + val stratoClient = new SignalsClientColumn(client) + + /* + This cache holds a users recent engagements for a short period of time, such that batched requests + for multiple (userid, tweetid) pairs don't all need to fetch them. + + [1] Caffeine cache keys/values must be objects, so we cannot use the `Long` primitive directly. + The boxed java.lang.Long works as a key, since it is an object. In most situations the compiler + can see where auto(un)boxing can occur. However, here we seem to need some wrapper functions + with explicit types to allow the boxing to happen. + */ + val mapCache: ConcurrentMap[java.lang.Long, Stitch[Engagements]] = + Caffeine + .newBuilder() + .expireAfterWrite(5, TimeUnit.SECONDS) + .maximumSize( + 1000 // We estimate 5M unique users in a 5m period - with 2k RSX instances, assume that one will see < 1k in a 5s period + ) + .build[java.lang.Long, Stitch[Engagements]] + .asMap + + statsReceiver.provideGauge("ussRecentEngagementsClient", "cache_size") { mapCache.size.toFloat } + + val engagementsClient = + new UserSignalServiceRecentEngagementsClient(stratoClient, decider, statsReceiver) + + val f = (l: java.lang.Long) => engagementsClient.get(l) // See note [1] above + val cachedCall = MemoizeQuery(f, EvictingCache.lazily(new ConcurrentMapCache(mapCache))) + (l: Long) => cachedCall(l) // see note [1] above + } +} diff --git a/representation-scorer/server/src/main/thrift/BUILD b/representation-scorer/server/src/main/thrift/BUILD new file mode 100644 index 000000000..f7ea37675 --- /dev/null +++ b/representation-scorer/server/src/main/thrift/BUILD @@ -0,0 +1,20 @@ +create_thrift_libraries( + base_name = "thrift", + sources = [ + "com/twitter/representationscorer/service.thrift", + ], + platform = "java8", + tags = [ + "bazel-compatible", + ], + dependency_roots = [ + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift", + ], + generate_languages = [ + "java", + "scala", + "strato", + ], + provides_java_name = "representationscorer-service-thrift-java", + provides_scala_name = "representationscorer-service-thrift-scala", +) diff --git a/representation-scorer/server/src/main/thrift/com/twitter/representationscorer/service.thrift b/representation-scorer/server/src/main/thrift/com/twitter/representationscorer/service.thrift new file mode 100644 index 000000000..0e2f23a31 --- /dev/null +++ b/representation-scorer/server/src/main/thrift/com/twitter/representationscorer/service.thrift @@ -0,0 +1,106 @@ +namespace java com.twitter.representationscorer.thriftjava +#@namespace scala com.twitter.representationscorer.thriftscala +#@namespace strato com.twitter.representationscorer + +include "com/twitter/simclusters_v2/identifier.thrift" +include "com/twitter/simclusters_v2/online_store.thrift" +include "com/twitter/simclusters_v2/score.thrift" + +struct SimClustersRecentEngagementSimilarities { + // All scores computed using cosine similarity + // 1 - 1000 Positive Signals + 1: optional double fav1dLast10Max // max score from last 10 faves in the last 1 day + 2: optional double fav1dLast10Avg // avg score from last 10 faves in the last 1 day + 3: optional double fav7dLast10Max // max score from last 10 faves in the last 7 days + 4: optional double fav7dLast10Avg // avg score from last 10 faves in the last 7 days + 5: optional double retweet1dLast10Max // max score from last 10 retweets in the last 1 days + 6: optional double retweet1dLast10Avg // avg score from last 10 retweets in the last 1 days + 7: optional double retweet7dLast10Max // max score from last 10 retweets in the last 7 days + 8: optional double retweet7dLast10Avg // avg score from last 10 retweets in the last 7 days + 9: optional double follow7dLast10Max // max score from the last 10 follows in the last 7 days + 10: optional double follow7dLast10Avg // avg score from the last 10 follows in the last 7 days + 11: optional double follow30dLast10Max // max score from the last 10 follows in the last 30 days + 12: optional double follow30dLast10Avg // avg score from the last 10 follows in the last 30 days + 13: optional double share1dLast10Max // max score from last 10 shares in the last 1 day + 14: optional double share1dLast10Avg // avg score from last 10 shares in the last 1 day + 15: optional double share7dLast10Max // max score from last 10 shares in the last 7 days + 16: optional double share7dLast10Avg // avg score from last 10 shares in the last 7 days + 17: optional double reply1dLast10Max // max score from last 10 replies in the last 1 day + 18: optional double reply1dLast10Avg // avg score from last 10 replies in the last 1 day + 19: optional double reply7dLast10Max // max score from last 10 replies in the last 7 days + 20: optional double reply7dLast10Avg // avg score from last 10 replies in the last 7 days + 21: optional double originalTweet1dLast10Max // max score from last 10 original tweets in the last 1 day + 22: optional double originalTweet1dLast10Avg // avg score from last 10 original tweets in the last 1 day + 23: optional double originalTweet7dLast10Max // max score from last 10 original tweets in the last 7 days + 24: optional double originalTweet7dLast10Avg // avg score from last 10 original tweets in the last 7 days + 25: optional double videoPlayback1dLast10Max // max score from last 10 video playback50 in the last 1 day + 26: optional double videoPlayback1dLast10Avg // avg score from last 10 video playback50 in the last 1 day + 27: optional double videoPlayback7dLast10Max // max score from last 10 video playback50 in the last 7 days + 28: optional double videoPlayback7dLast10Avg // avg score from last 10 video playback50 in the last 7 days + + // 1001 - 2000 Implicit Signals + + // 2001 - 3000 Negative Signals + // Block Series + 2001: optional double block1dLast10Avg + 2002: optional double block1dLast10Max + 2003: optional double block7dLast10Avg + 2004: optional double block7dLast10Max + 2005: optional double block30dLast10Avg + 2006: optional double block30dLast10Max + // Mute Series + 2101: optional double mute1dLast10Avg + 2102: optional double mute1dLast10Max + 2103: optional double mute7dLast10Avg + 2104: optional double mute7dLast10Max + 2105: optional double mute30dLast10Avg + 2106: optional double mute30dLast10Max + // Report Series + 2201: optional double report1dLast10Avg + 2202: optional double report1dLast10Max + 2203: optional double report7dLast10Avg + 2204: optional double report7dLast10Max + 2205: optional double report30dLast10Avg + 2206: optional double report30dLast10Max + // Dontlike + 2301: optional double dontlike1dLast10Avg + 2302: optional double dontlike1dLast10Max + 2303: optional double dontlike7dLast10Avg + 2304: optional double dontlike7dLast10Max + 2305: optional double dontlike30dLast10Avg + 2306: optional double dontlike30dLast10Max + // SeeFewer + 2401: optional double seeFewer1dLast10Avg + 2402: optional double seeFewer1dLast10Max + 2403: optional double seeFewer7dLast10Avg + 2404: optional double seeFewer7dLast10Max + 2405: optional double seeFewer30dLast10Avg + 2406: optional double seeFewer30dLast10Max +}(persisted='true', hasPersonalData = 'true') + +/* + * List score API + */ +struct ListScoreId { + 1: required score.ScoringAlgorithm algorithm + 2: required online_store.ModelVersion modelVersion + 3: required identifier.EmbeddingType targetEmbeddingType + 4: required identifier.InternalId targetId + 5: required identifier.EmbeddingType candidateEmbeddingType + 6: required list candidateIds +}(hasPersonalData = 'true') + +struct ScoreResult { + // This api does not communicate why a score is missing. For example, it may be unavailable + // because the referenced entities do not exist (e.g. the embedding was not found) or because + // timeouts prevented us from calculating it. + 1: optional double score +} + +struct ListScoreResponse { + 1: required list scores // Guaranteed to be the same number/order as requested +} + +struct RecentEngagementSimilaritiesResponse { + 1: required list results // Guaranteed to be the same number/order as requested +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/BCELabelTransformFromUUADataRecord.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/BCELabelTransformFromUUADataRecord.scala new file mode 100644 index 000000000..6adf6eaf8 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/BCELabelTransformFromUUADataRecord.scala @@ -0,0 +1,68 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api.ITransform +import com.twitter.ml.api.constant.SharedFeatures +import java.lang.{Double => JDouble} + +import com.twitter.timelines.prediction.common.adapters.AdapterConsumer +import com.twitter.timelines.prediction.common.adapters.EngagementLabelFeaturesDataRecordUtils +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.RichDataRecord +import com.twitter.timelines.suggests.common.engagement.thriftscala.EngagementType +import com.twitter.timelines.suggests.common.engagement.thriftscala.Engagement +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import com.twitter.timelines.prediction.features.common.CombinedFeatures + +/** + * To transfrom BCE events UUA data records that contain only continuous dwell time to datarecords that contain corresponding binary label features + * The UUA datarecords inputted would have USER_ID, SOURCE_TWEET_ID,TIMESTAMP and + * 0 or one of (TWEET_DETAIL_DWELL_TIME_MS, PROFILE_DWELL_TIME_MS, FULLSCREEN_VIDEO_DWELL_TIME_MS) features. + * We will use the different engagement TIME_MS to differentiate different engagements, + * and then re-use the function in EngagementTypeConverte to add the binary label to the datarecord. + **/ + +object BCELabelTransformFromUUADataRecord extends ITransform { + + val dwellTimeFeatureToEngagementMap = Map( + TimelinesSharedFeatures.TWEET_DETAIL_DWELL_TIME_MS -> EngagementType.TweetDetailDwell, + TimelinesSharedFeatures.PROFILE_DWELL_TIME_MS -> EngagementType.ProfileDwell, + TimelinesSharedFeatures.FULLSCREEN_VIDEO_DWELL_TIME_MS -> EngagementType.FullscreenVideoDwell + ) + + def dwellFeatureToEngagement( + rdr: RichDataRecord, + dwellTimeFeature: Feature[JDouble], + engagementType: EngagementType + ): Option[Engagement] = { + if (rdr.hasFeature(dwellTimeFeature)) { + Some( + Engagement( + engagementType = engagementType, + timestampMs = rdr.getFeatureValue(SharedFeatures.TIMESTAMP), + weight = Some(rdr.getFeatureValue(dwellTimeFeature)) + )) + } else { + None + } + } + override def transformContext(featureContext: FeatureContext): FeatureContext = { + featureContext.addFeatures( + (CombinedFeatures.TweetDetailDwellEngagements ++ CombinedFeatures.ProfileDwellEngagements ++ CombinedFeatures.FullscreenVideoDwellEngagements).toSeq: _*) + } + override def transform(record: DataRecord): Unit = { + val rdr = new RichDataRecord(record) + val engagements = dwellTimeFeatureToEngagementMap + .map { + case (dwellTimeFeature, engagementType) => + dwellFeatureToEngagement(rdr, dwellTimeFeature, engagementType) + }.flatten.toSeq + + // Re-use BCE( behavior client events) label conversion in EngagementTypeConverter to align with BCE labels generation for offline training data + EngagementLabelFeaturesDataRecordUtils.setDwellTimeFeatures( + rdr, + Some(engagements), + AdapterConsumer.Combined) + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/BUILD b/src/scala/com/twitter/timelines/prediction/common/aggregates/BUILD new file mode 100644 index 000000000..01c930e8e --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/BUILD @@ -0,0 +1,353 @@ +create_datasets( + base_name = "original_author_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/original_author_aggregates/1556496000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.OriginalAuthor", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "twitter_wide_user_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/twitter_wide_user_aggregates/1556496000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.TwitterWideUser", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "twitter_wide_user_author_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/twitter_wide_user_author_aggregates/1556323200000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.TwitterWideUserAuthor", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_aggregates/1556150400000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.User", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_author_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_author_aggregates/1556064000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserAuthor", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "aggregates_canary", + fallback_path = "gs://user.timelines.dp.gcp.twttr.net//canaries/processed/aggregates_v2/user_aggregates/1622851200000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.User", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_engager_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_engager_aggregates/1556496000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserEngager", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_original_author_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_original_author_aggregates/1556496000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserOriginalAuthor", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "author_topic_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/author_topic_aggregates/1589932800000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.AuthorTopic", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_topic_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_topic_aggregates/1590278400000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserTopic", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_inferred_topic_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_inferred_topic_aggregates/1599696000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserInferredTopic", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_mention_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_mention_aggregates/1556582400000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserMention", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_request_dow_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_request_dow_aggregates/1556236800000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserRequestDow", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +create_datasets( + base_name = "user_request_hour_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_request_hour_aggregates/1556150400000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserRequestHour", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + + +create_datasets( + base_name = "user_list_aggregates", + fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_list_aggregates/1590624000000", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserList", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + + +create_datasets( + base_name = "user_media_understanding_annotation_aggregates", + key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey", + platform = "java8", + role = "timelines", + scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserMediaUnderstandingAnnotation", + segment_type = "snapshot", + tags = ["bazel-compatible"], + val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)", + scala_dependencies = [ + ":injections", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) + +scala_library( + sources = [ + "BCELabelTransformFromUUADataRecord.scala", + "FeatureSelectorConfig.scala", + "RecapUserFeatureAggregation.scala", + "RectweetUserFeatureAggregation.scala", + "TimelinesAggregationConfig.scala", + "TimelinesAggregationConfigDetails.scala", + "TimelinesAggregationConfigTrait.scala", + "TimelinesAggregationSources.scala", + ], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + ":aggregates_canary-scala", + ":author_topic_aggregates-scala", + ":original_author_aggregates-scala", + ":twitter_wide_user_aggregates-scala", + ":twitter_wide_user_author_aggregates-scala", + ":user_aggregates-scala", + ":user_author_aggregates-scala", + ":user_engager_aggregates-scala", + ":user_inferred_topic_aggregates-scala", + ":user_list_aggregates-scala", + ":user_media_understanding_annotation_aggregates-scala", + ":user_mention_aggregates-scala", + ":user_original_author_aggregates-scala", + ":user_request_dow_aggregates-scala", + ":user_request_hour_aggregates-scala", + ":user_topic_aggregates-scala", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/java/com/twitter/ml/api/matcher", + "src/scala/com/twitter/common/text/util", + "src/scala/com/twitter/dal/client/dataset", + "src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core", + "src/scala/com/twitter/scalding_internal/multiformat/format", + "src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter", + "src/scala/com/twitter/timelines/prediction/features/client_log_event", + "src/scala/com/twitter/timelines/prediction/features/common", + "src/scala/com/twitter/timelines/prediction/features/engagement_features", + "src/scala/com/twitter/timelines/prediction/features/escherbird", + "src/scala/com/twitter/timelines/prediction/features/itl", + "src/scala/com/twitter/timelines/prediction/features/list_features", + "src/scala/com/twitter/timelines/prediction/features/p_home_latest", + "src/scala/com/twitter/timelines/prediction/features/real_graph", + "src/scala/com/twitter/timelines/prediction/features/recap", + "src/scala/com/twitter/timelines/prediction/features/request_context", + "src/scala/com/twitter/timelines/prediction/features/simcluster", + "src/scala/com/twitter/timelines/prediction/features/time_features", + "src/scala/com/twitter/timelines/prediction/transform/filter", + "src/thrift/com/twitter/timelines/suggests/common:engagement-scala", + "timelines/data_processing/ad_hoc/recap/data_record_preparation:recap_data_records_agg_minimal-java", + "util/util-core:scala", + ], +) + +scala_library( + name = "injections", + sources = [ + "FeatureSelectorConfig.scala", + "RecapUserFeatureAggregation.scala", + "RectweetUserFeatureAggregation.scala", + "TimelinesAggregationConfigDetails.scala", + "TimelinesAggregationConfigTrait.scala", + "TimelinesAggregationKeyValInjections.scala", + "TimelinesAggregationSources.scala", + ], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/java/com/twitter/ml/api/matcher", + "src/scala/com/twitter/common/text/util", + "src/scala/com/twitter/dal/client/dataset", + "src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core", + "src/scala/com/twitter/scalding_internal/multiformat/format", + "src/scala/com/twitter/timelines/prediction/features/client_log_event", + "src/scala/com/twitter/timelines/prediction/features/common", + "src/scala/com/twitter/timelines/prediction/features/engagement_features", + "src/scala/com/twitter/timelines/prediction/features/escherbird", + "src/scala/com/twitter/timelines/prediction/features/itl", + "src/scala/com/twitter/timelines/prediction/features/list_features", + "src/scala/com/twitter/timelines/prediction/features/p_home_latest", + "src/scala/com/twitter/timelines/prediction/features/real_graph", + "src/scala/com/twitter/timelines/prediction/features/recap", + "src/scala/com/twitter/timelines/prediction/features/request_context", + "src/scala/com/twitter/timelines/prediction/features/semantic_core_features", + "src/scala/com/twitter/timelines/prediction/features/simcluster", + "src/scala/com/twitter/timelines/prediction/features/time_features", + "src/scala/com/twitter/timelines/prediction/transform/filter", + "timelines/data_processing/ad_hoc/recap/data_record_preparation:recap_data_records_agg_minimal-java", + "util/util-core:scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/FeatureSelectorConfig.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/FeatureSelectorConfig.scala new file mode 100644 index 000000000..1c91ef16c --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/FeatureSelectorConfig.scala @@ -0,0 +1,121 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.matcher.FeatureMatcher +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import scala.collection.JavaConverters._ + +object FeatureSelectorConfig { + val BasePairsToStore = Seq( + ("twitter_wide_user_aggregate.pair", "*"), + ("twitter_wide_user_author_aggregate.pair", "*"), + ("user_aggregate_v5.continuous.pair", "*"), + ("user_aggregate_v7.pair", "*"), + ("user_author_aggregate_v2.pair", "recap.earlybird.*"), + ("user_author_aggregate_v2.pair", "recap.searchfeature.*"), + ("user_author_aggregate_v2.pair", "recap.tweetfeature.embeds*"), + ("user_author_aggregate_v2.pair", "recap.tweetfeature.link_count*"), + ("user_author_aggregate_v2.pair", "engagement_features.in_network.*"), + ("user_author_aggregate_v2.pair", "recap.tweetfeature.is_reply.*"), + ("user_author_aggregate_v2.pair", "recap.tweetfeature.is_retweet.*"), + ("user_author_aggregate_v2.pair", "recap.tweetfeature.num_mentions.*"), + ("user_author_aggregate_v5.pair", "*"), + ("user_author_aggregate_tweetsource_v1.pair", "*"), + ("user_engager_aggregate.pair", "*"), + ("user_mention_aggregate.pair", "*"), + ("user_request_context_aggregate.dow.pair", "*"), + ("user_request_context_aggregate.hour.pair", "*"), + ("user_aggregate_v6.pair", "*"), + ("user_original_author_aggregate_v1.pair", "*"), + ("user_original_author_aggregate_v2.pair", "*"), + ("original_author_aggregate_v1.pair", "*"), + ("original_author_aggregate_v2.pair", "*"), + ("author_topic_aggregate.pair", "*"), + ("user_list_aggregate.pair", "*"), + ("user_topic_aggregate.pair", "*"), + ("user_topic_aggregate_v2.pair", "*"), + ("user_inferred_topic_aggregate.pair", "*"), + ("user_inferred_topic_aggregate_v2.pair", "*"), + ("user_media_annotation_aggregate.pair", "*"), + ("user_media_annotation_aggregate.pair", "*"), + ("user_author_good_click_aggregate.pair", "*"), + ("user_engager_good_click_aggregate.pair", "*") + ) + val PairsToStore = BasePairsToStore ++ Seq( + ("user_aggregate_v2.pair", "*"), + ("user_aggregate_v5.boolean.pair", "*"), + ("user_aggregate_tweetsource_v1.pair", "*"), + ) + + + val LabelsToStore = Seq( + "any_label", + "recap.engagement.is_favorited", + "recap.engagement.is_retweeted", + "recap.engagement.is_replied", + "recap.engagement.is_open_linked", + "recap.engagement.is_profile_clicked", + "recap.engagement.is_clicked", + "recap.engagement.is_photo_expanded", + "recap.engagement.is_video_playback_50", + "recap.engagement.is_video_quality_viewed", + "recap.engagement.is_replied_reply_impressed_by_author", + "recap.engagement.is_replied_reply_favorited_by_author", + "recap.engagement.is_replied_reply_replied_by_author", + "recap.engagement.is_report_tweet_clicked", + "recap.engagement.is_block_clicked", + "recap.engagement.is_mute_clicked", + "recap.engagement.is_dont_like", + "recap.engagement.is_good_clicked_convo_desc_favorited_or_replied", + "recap.engagement.is_good_clicked_convo_desc_v2", + "itl.engagement.is_favorited", + "itl.engagement.is_retweeted", + "itl.engagement.is_replied", + "itl.engagement.is_open_linked", + "itl.engagement.is_profile_clicked", + "itl.engagement.is_clicked", + "itl.engagement.is_photo_expanded", + "itl.engagement.is_video_playback_50" + ) + + val PairGlobsToStore = for { + (prefix, suffix) <- PairsToStore + label <- LabelsToStore + } yield FeatureMatcher.glob(prefix + "." + label + "." + suffix) + + val BaseAggregateV2FeatureSelector = FeatureMatcher + .none() + .or( + FeatureMatcher.glob("meta.user_id"), + FeatureMatcher.glob("meta.author_id"), + FeatureMatcher.glob("entities.original_author_id"), + FeatureMatcher.glob("entities.topic_id"), + FeatureMatcher + .glob("entities.inferred_topic_ids" + TypedAggregateGroup.SparseFeatureSuffix), + FeatureMatcher.glob("timelines.meta.list_id"), + FeatureMatcher.glob("list.id"), + FeatureMatcher + .glob("engagement_features.user_ids.public" + TypedAggregateGroup.SparseFeatureSuffix), + FeatureMatcher + .glob("entities.users.mentioned_screen_names" + TypedAggregateGroup.SparseFeatureSuffix), + FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_dont_like.*"), + FeatureMatcher.glob("user_author_aggregate_v2.pair.any_label.recap.tweetfeature.has_*"), + FeatureMatcher.glob("request_context.country_code"), + FeatureMatcher.glob("request_context.timestamp_gmt_dow"), + FeatureMatcher.glob("request_context.timestamp_gmt_hour"), + FeatureMatcher.glob( + "semantic_core.media_understanding.high_recall.non_sensitive.entity_ids" + TypedAggregateGroup.SparseFeatureSuffix) + ) + + val AggregatesV2ProdFeatureSelector = BaseAggregateV2FeatureSelector + .orList(PairGlobsToStore.asJava) + + val ReducedPairGlobsToStore = (for { + (prefix, suffix) <- BasePairsToStore + label <- LabelsToStore + } yield FeatureMatcher.glob(prefix + "." + label + "." + suffix)) ++ Seq( + FeatureMatcher.glob("user_aggregate_v2.pair.any_label.*"), + FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_favorited.*"), + FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_photo_expanded.*"), + FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_profile_clicked.*") + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/README.md b/src/scala/com/twitter/timelines/prediction/common/aggregates/README.md new file mode 100644 index 000000000..0bae21a14 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/README.md @@ -0,0 +1,6 @@ +## Timelines Aggregation Jobs + +This directory contains the specific definition of aggregate jobs that generate features used by the Heavy Ranker. +The primary files of interest are [`TimelinesAggregationConfigDetails.scala`](TimelinesAggregationConfigDetails.scala), which contains the defintion for the batch aggregate jobs and [`real_time/TimelinesOnlineAggregationConfigBase.scala`](real_time/TimelinesOnlineAggregationConfigBase.scala) which contains the definitions for the real time aggregate jobs. + +The aggregation framework that these jobs are based on is [here](../../../../../../../../timelines/data_processing/ml_util/aggregation_framework). \ No newline at end of file diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/RecapUserFeatureAggregation.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/RecapUserFeatureAggregation.scala new file mode 100644 index 000000000..657d5a713 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/RecapUserFeatureAggregation.scala @@ -0,0 +1,415 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.Feature +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures +import com.twitter.timelines.prediction.features.real_graph.RealGraphDataRecordFeatures +import com.twitter.timelines.prediction.features.recap.RecapFeatures +import com.twitter.timelines.prediction.features.time_features.TimeDataRecordFeatures + +object RecapUserFeatureAggregation { + val RecapFeaturesForAggregation: Set[Feature[_]] = + Set( + RecapFeatures.HAS_IMAGE, + RecapFeatures.HAS_VIDEO, + RecapFeatures.FROM_MUTUAL_FOLLOW, + RecapFeatures.HAS_CARD, + RecapFeatures.HAS_NEWS, + RecapFeatures.REPLY_COUNT, + RecapFeatures.FAV_COUNT, + RecapFeatures.RETWEET_COUNT, + RecapFeatures.BLENDER_SCORE, + RecapFeatures.CONVERSATIONAL_COUNT, + RecapFeatures.IS_BUSINESS_SCORE, + RecapFeatures.CONTAINS_MEDIA, + RecapFeatures.RETWEET_SEARCHER, + RecapFeatures.REPLY_SEARCHER, + RecapFeatures.MENTION_SEARCHER, + RecapFeatures.REPLY_OTHER, + RecapFeatures.RETWEET_OTHER, + RecapFeatures.MATCH_UI_LANG, + RecapFeatures.MATCH_SEARCHER_MAIN_LANG, + RecapFeatures.MATCH_SEARCHER_LANGS, + RecapFeatures.TWEET_COUNT_FROM_USER_IN_SNAPSHOT, + RecapFeatures.TEXT_SCORE, + RealGraphDataRecordFeatures.NUM_RETWEETS_EWMA, + RealGraphDataRecordFeatures.NUM_RETWEETS_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_RETWEETS_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_RETWEETS_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.NUM_FAVORITES_EWMA, + RealGraphDataRecordFeatures.NUM_FAVORITES_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_FAVORITES_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_FAVORITES_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.NUM_MENTIONS_EWMA, + RealGraphDataRecordFeatures.NUM_MENTIONS_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_MENTIONS_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_MENTIONS_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_EWMA, + RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_EWMA, + RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_EWMA, + RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_ELAPSED_DAYS, + RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_DAYS_SINCE_LAST, + RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_EWMA, + RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_NON_ZERO_DAYS, + RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_ELAPSED_DAYS, + RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_DAYS_SINCE_LAST + ) + + val RecapLabelsForAggregation: Set[Feature.Binary] = + Set( + RecapFeatures.IS_FAVORITED, + RecapFeatures.IS_RETWEETED, + RecapFeatures.IS_CLICKED, + RecapFeatures.IS_PROFILE_CLICKED, + RecapFeatures.IS_OPEN_LINKED + ) + + val DwellDuration: Set[Feature[_]] = + Set( + TimelinesSharedFeatures.DWELL_TIME_MS, + ) + + val UserFeaturesV2: Set[Feature[_]] = RecapFeaturesForAggregation ++ Set( + RecapFeatures.HAS_VINE, + RecapFeatures.HAS_PERISCOPE, + RecapFeatures.HAS_PRO_VIDEO, + RecapFeatures.HAS_VISIBLE_LINK, + RecapFeatures.BIDIRECTIONAL_FAV_COUNT, + RecapFeatures.UNIDIRECTIONAL_FAV_COUNT, + RecapFeatures.BIDIRECTIONAL_REPLY_COUNT, + RecapFeatures.UNIDIRECTIONAL_REPLY_COUNT, + RecapFeatures.BIDIRECTIONAL_RETWEET_COUNT, + RecapFeatures.UNIDIRECTIONAL_RETWEET_COUNT, + RecapFeatures.EMBEDS_URL_COUNT, + RecapFeatures.EMBEDS_IMPRESSION_COUNT, + RecapFeatures.VIDEO_VIEW_COUNT, + RecapFeatures.IS_RETWEET, + RecapFeatures.IS_REPLY, + RecapFeatures.IS_EXTENDED_REPLY, + RecapFeatures.HAS_LINK, + RecapFeatures.HAS_TREND, + RecapFeatures.LINK_LANGUAGE, + RecapFeatures.NUM_HASHTAGS, + RecapFeatures.NUM_MENTIONS, + RecapFeatures.IS_SENSITIVE, + RecapFeatures.HAS_MULTIPLE_MEDIA, + RecapFeatures.USER_REP, + RecapFeatures.FAV_COUNT_V2, + RecapFeatures.RETWEET_COUNT_V2, + RecapFeatures.REPLY_COUNT_V2, + RecapFeatures.LINK_COUNT, + EngagementDataRecordFeatures.InNetworkFavoritesCount, + EngagementDataRecordFeatures.InNetworkRetweetsCount, + EngagementDataRecordFeatures.InNetworkRepliesCount + ) + + val UserAuthorFeaturesV2: Set[Feature[_]] = Set( + RecapFeatures.HAS_IMAGE, + RecapFeatures.HAS_VINE, + RecapFeatures.HAS_PERISCOPE, + RecapFeatures.HAS_PRO_VIDEO, + RecapFeatures.HAS_VIDEO, + RecapFeatures.HAS_CARD, + RecapFeatures.HAS_NEWS, + RecapFeatures.HAS_VISIBLE_LINK, + RecapFeatures.REPLY_COUNT, + RecapFeatures.FAV_COUNT, + RecapFeatures.RETWEET_COUNT, + RecapFeatures.BLENDER_SCORE, + RecapFeatures.CONVERSATIONAL_COUNT, + RecapFeatures.IS_BUSINESS_SCORE, + RecapFeatures.CONTAINS_MEDIA, + RecapFeatures.RETWEET_SEARCHER, + RecapFeatures.REPLY_SEARCHER, + RecapFeatures.MENTION_SEARCHER, + RecapFeatures.REPLY_OTHER, + RecapFeatures.RETWEET_OTHER, + RecapFeatures.MATCH_UI_LANG, + RecapFeatures.MATCH_SEARCHER_MAIN_LANG, + RecapFeatures.MATCH_SEARCHER_LANGS, + RecapFeatures.TWEET_COUNT_FROM_USER_IN_SNAPSHOT, + RecapFeatures.TEXT_SCORE, + RecapFeatures.BIDIRECTIONAL_FAV_COUNT, + RecapFeatures.UNIDIRECTIONAL_FAV_COUNT, + RecapFeatures.BIDIRECTIONAL_REPLY_COUNT, + RecapFeatures.UNIDIRECTIONAL_REPLY_COUNT, + RecapFeatures.BIDIRECTIONAL_RETWEET_COUNT, + RecapFeatures.UNIDIRECTIONAL_RETWEET_COUNT, + RecapFeatures.EMBEDS_URL_COUNT, + RecapFeatures.EMBEDS_IMPRESSION_COUNT, + RecapFeatures.VIDEO_VIEW_COUNT, + RecapFeatures.IS_RETWEET, + RecapFeatures.IS_REPLY, + RecapFeatures.HAS_LINK, + RecapFeatures.HAS_TREND, + RecapFeatures.LINK_LANGUAGE, + RecapFeatures.NUM_HASHTAGS, + RecapFeatures.NUM_MENTIONS, + RecapFeatures.IS_SENSITIVE, + RecapFeatures.HAS_MULTIPLE_MEDIA, + RecapFeatures.FAV_COUNT_V2, + RecapFeatures.RETWEET_COUNT_V2, + RecapFeatures.REPLY_COUNT_V2, + RecapFeatures.LINK_COUNT, + EngagementDataRecordFeatures.InNetworkFavoritesCount, + EngagementDataRecordFeatures.InNetworkRetweetsCount, + EngagementDataRecordFeatures.InNetworkRepliesCount + ) + + val UserAuthorFeaturesV2Count: Set[Feature[_]] = Set( + RecapFeatures.HAS_IMAGE, + RecapFeatures.HAS_VINE, + RecapFeatures.HAS_PERISCOPE, + RecapFeatures.HAS_PRO_VIDEO, + RecapFeatures.HAS_VIDEO, + RecapFeatures.HAS_CARD, + RecapFeatures.HAS_NEWS, + RecapFeatures.HAS_VISIBLE_LINK, + RecapFeatures.FAV_COUNT, + RecapFeatures.CONTAINS_MEDIA, + RecapFeatures.RETWEET_SEARCHER, + RecapFeatures.REPLY_SEARCHER, + RecapFeatures.MENTION_SEARCHER, + RecapFeatures.REPLY_OTHER, + RecapFeatures.RETWEET_OTHER, + RecapFeatures.MATCH_UI_LANG, + RecapFeatures.MATCH_SEARCHER_MAIN_LANG, + RecapFeatures.MATCH_SEARCHER_LANGS, + RecapFeatures.IS_RETWEET, + RecapFeatures.IS_REPLY, + RecapFeatures.HAS_LINK, + RecapFeatures.HAS_TREND, + RecapFeatures.IS_SENSITIVE, + RecapFeatures.HAS_MULTIPLE_MEDIA, + EngagementDataRecordFeatures.InNetworkFavoritesCount + ) + + val UserTopicFeaturesV2Count: Set[Feature[_]] = Set( + RecapFeatures.HAS_IMAGE, + RecapFeatures.HAS_VIDEO, + RecapFeatures.HAS_CARD, + RecapFeatures.HAS_NEWS, + RecapFeatures.FAV_COUNT, + RecapFeatures.CONTAINS_MEDIA, + RecapFeatures.RETWEET_SEARCHER, + RecapFeatures.REPLY_SEARCHER, + RecapFeatures.MENTION_SEARCHER, + RecapFeatures.REPLY_OTHER, + RecapFeatures.RETWEET_OTHER, + RecapFeatures.MATCH_UI_LANG, + RecapFeatures.MATCH_SEARCHER_MAIN_LANG, + RecapFeatures.MATCH_SEARCHER_LANGS, + RecapFeatures.IS_RETWEET, + RecapFeatures.IS_REPLY, + RecapFeatures.HAS_LINK, + RecapFeatures.HAS_TREND, + RecapFeatures.IS_SENSITIVE, + EngagementDataRecordFeatures.InNetworkFavoritesCount, + EngagementDataRecordFeatures.InNetworkRetweetsCount, + TimelinesSharedFeatures.NUM_CAPS, + TimelinesSharedFeatures.ASPECT_RATIO_DEN, + TimelinesSharedFeatures.NUM_NEWLINES, + TimelinesSharedFeatures.IS_360, + TimelinesSharedFeatures.IS_MANAGED, + TimelinesSharedFeatures.IS_MONETIZABLE, + TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE, + TimelinesSharedFeatures.HAS_TITLE, + TimelinesSharedFeatures.HAS_DESCRIPTION, + TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION, + TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION + ) + + val UserFeaturesV5Continuous: Set[Feature[_]] = Set( + TimelinesSharedFeatures.QUOTE_COUNT, + TimelinesSharedFeatures.VISIBLE_TOKEN_RATIO, + TimelinesSharedFeatures.WEIGHTED_FAV_COUNT, + TimelinesSharedFeatures.WEIGHTED_RETWEET_COUNT, + TimelinesSharedFeatures.WEIGHTED_REPLY_COUNT, + TimelinesSharedFeatures.WEIGHTED_QUOTE_COUNT, + TimelinesSharedFeatures.EMBEDS_IMPRESSION_COUNT_V2, + TimelinesSharedFeatures.EMBEDS_URL_COUNT_V2, + TimelinesSharedFeatures.DECAYED_FAVORITE_COUNT, + TimelinesSharedFeatures.DECAYED_RETWEET_COUNT, + TimelinesSharedFeatures.DECAYED_REPLY_COUNT, + TimelinesSharedFeatures.DECAYED_QUOTE_COUNT, + TimelinesSharedFeatures.FAKE_FAVORITE_COUNT, + TimelinesSharedFeatures.FAKE_RETWEET_COUNT, + TimelinesSharedFeatures.FAKE_REPLY_COUNT, + TimelinesSharedFeatures.FAKE_QUOTE_COUNT, + TimeDataRecordFeatures.LAST_FAVORITE_SINCE_CREATION_HRS, + TimeDataRecordFeatures.LAST_RETWEET_SINCE_CREATION_HRS, + TimeDataRecordFeatures.LAST_REPLY_SINCE_CREATION_HRS, + TimeDataRecordFeatures.LAST_QUOTE_SINCE_CREATION_HRS, + TimeDataRecordFeatures.TIME_SINCE_LAST_FAVORITE_HRS, + TimeDataRecordFeatures.TIME_SINCE_LAST_RETWEET_HRS, + TimeDataRecordFeatures.TIME_SINCE_LAST_REPLY_HRS, + TimeDataRecordFeatures.TIME_SINCE_LAST_QUOTE_HRS + ) + + val UserFeaturesV5Boolean: Set[Feature[_]] = Set( + TimelinesSharedFeatures.LABEL_ABUSIVE_FLAG, + TimelinesSharedFeatures.LABEL_ABUSIVE_HI_RCL_FLAG, + TimelinesSharedFeatures.LABEL_DUP_CONTENT_FLAG, + TimelinesSharedFeatures.LABEL_NSFW_HI_PRC_FLAG, + TimelinesSharedFeatures.LABEL_NSFW_HI_RCL_FLAG, + TimelinesSharedFeatures.LABEL_SPAM_FLAG, + TimelinesSharedFeatures.LABEL_SPAM_HI_RCL_FLAG, + TimelinesSharedFeatures.PERISCOPE_EXISTS, + TimelinesSharedFeatures.PERISCOPE_IS_LIVE, + TimelinesSharedFeatures.PERISCOPE_HAS_BEEN_FEATURED, + TimelinesSharedFeatures.PERISCOPE_IS_CURRENTLY_FEATURED, + TimelinesSharedFeatures.PERISCOPE_IS_FROM_QUALITY_SOURCE, + TimelinesSharedFeatures.HAS_QUOTE + ) + + val UserAuthorFeaturesV5: Set[Feature[_]] = Set( + TimelinesSharedFeatures.HAS_QUOTE, + TimelinesSharedFeatures.LABEL_ABUSIVE_FLAG, + TimelinesSharedFeatures.LABEL_ABUSIVE_HI_RCL_FLAG, + TimelinesSharedFeatures.LABEL_DUP_CONTENT_FLAG, + TimelinesSharedFeatures.LABEL_NSFW_HI_PRC_FLAG, + TimelinesSharedFeatures.LABEL_NSFW_HI_RCL_FLAG, + TimelinesSharedFeatures.LABEL_SPAM_FLAG, + TimelinesSharedFeatures.LABEL_SPAM_HI_RCL_FLAG + ) + + val UserTweetSourceFeaturesV1Continuous: Set[Feature[_]] = Set( + TimelinesSharedFeatures.NUM_CAPS, + TimelinesSharedFeatures.NUM_WHITESPACES, + TimelinesSharedFeatures.TWEET_LENGTH, + TimelinesSharedFeatures.ASPECT_RATIO_DEN, + TimelinesSharedFeatures.ASPECT_RATIO_NUM, + TimelinesSharedFeatures.BIT_RATE, + TimelinesSharedFeatures.HEIGHT_1, + TimelinesSharedFeatures.HEIGHT_2, + TimelinesSharedFeatures.HEIGHT_3, + TimelinesSharedFeatures.HEIGHT_4, + TimelinesSharedFeatures.VIDEO_DURATION, + TimelinesSharedFeatures.WIDTH_1, + TimelinesSharedFeatures.WIDTH_2, + TimelinesSharedFeatures.WIDTH_3, + TimelinesSharedFeatures.WIDTH_4, + TimelinesSharedFeatures.NUM_MEDIA_TAGS + ) + + val UserTweetSourceFeaturesV1Boolean: Set[Feature[_]] = Set( + TimelinesSharedFeatures.HAS_QUESTION, + TimelinesSharedFeatures.RESIZE_METHOD_1, + TimelinesSharedFeatures.RESIZE_METHOD_2, + TimelinesSharedFeatures.RESIZE_METHOD_3, + TimelinesSharedFeatures.RESIZE_METHOD_4 + ) + + val UserTweetSourceFeaturesV2Continuous: Set[Feature[_]] = Set( + TimelinesSharedFeatures.NUM_EMOJIS, + TimelinesSharedFeatures.NUM_EMOTICONS, + TimelinesSharedFeatures.NUM_NEWLINES, + TimelinesSharedFeatures.NUM_STICKERS, + TimelinesSharedFeatures.NUM_FACES, + TimelinesSharedFeatures.NUM_COLOR_PALLETTE_ITEMS, + TimelinesSharedFeatures.VIEW_COUNT, + TimelinesSharedFeatures.TWEET_LENGTH_TYPE + ) + + val UserTweetSourceFeaturesV2Boolean: Set[Feature[_]] = Set( + TimelinesSharedFeatures.IS_360, + TimelinesSharedFeatures.IS_MANAGED, + TimelinesSharedFeatures.IS_MONETIZABLE, + TimelinesSharedFeatures.IS_EMBEDDABLE, + TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE, + TimelinesSharedFeatures.HAS_TITLE, + TimelinesSharedFeatures.HAS_DESCRIPTION, + TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION, + TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION + ) + + val UserAuthorTweetSourceFeaturesV1: Set[Feature[_]] = Set( + TimelinesSharedFeatures.HAS_QUESTION, + TimelinesSharedFeatures.TWEET_LENGTH, + TimelinesSharedFeatures.VIDEO_DURATION, + TimelinesSharedFeatures.NUM_MEDIA_TAGS + ) + + val UserAuthorTweetSourceFeaturesV2: Set[Feature[_]] = Set( + TimelinesSharedFeatures.NUM_CAPS, + TimelinesSharedFeatures.NUM_WHITESPACES, + TimelinesSharedFeatures.ASPECT_RATIO_DEN, + TimelinesSharedFeatures.ASPECT_RATIO_NUM, + TimelinesSharedFeatures.BIT_RATE, + TimelinesSharedFeatures.TWEET_LENGTH_TYPE, + TimelinesSharedFeatures.NUM_EMOJIS, + TimelinesSharedFeatures.NUM_EMOTICONS, + TimelinesSharedFeatures.NUM_NEWLINES, + TimelinesSharedFeatures.NUM_STICKERS, + TimelinesSharedFeatures.NUM_FACES, + TimelinesSharedFeatures.IS_360, + TimelinesSharedFeatures.IS_MANAGED, + TimelinesSharedFeatures.IS_MONETIZABLE, + TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE, + TimelinesSharedFeatures.HAS_TITLE, + TimelinesSharedFeatures.HAS_DESCRIPTION, + TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION, + TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION + ) + + val UserAuthorTweetSourceFeaturesV2Count: Set[Feature[_]] = Set( + TimelinesSharedFeatures.NUM_CAPS, + TimelinesSharedFeatures.ASPECT_RATIO_DEN, + TimelinesSharedFeatures.NUM_NEWLINES, + TimelinesSharedFeatures.IS_360, + TimelinesSharedFeatures.IS_MANAGED, + TimelinesSharedFeatures.IS_MONETIZABLE, + TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE, + TimelinesSharedFeatures.HAS_TITLE, + TimelinesSharedFeatures.HAS_DESCRIPTION, + TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION, + TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION + ) + + val LabelsV2: Set[Feature.Binary] = RecapLabelsForAggregation ++ Set( + RecapFeatures.IS_REPLIED, + RecapFeatures.IS_PHOTO_EXPANDED, + RecapFeatures.IS_VIDEO_PLAYBACK_50 + ) + + val TwitterWideFeatures: Set[Feature[_]] = Set( + RecapFeatures.IS_REPLY, + TimelinesSharedFeatures.HAS_QUOTE, + RecapFeatures.HAS_MENTION, + RecapFeatures.HAS_HASHTAG, + RecapFeatures.HAS_LINK, + RecapFeatures.HAS_CARD, + RecapFeatures.CONTAINS_MEDIA + ) + + val TwitterWideLabels: Set[Feature.Binary] = Set( + RecapFeatures.IS_FAVORITED, + RecapFeatures.IS_RETWEETED, + RecapFeatures.IS_REPLIED + ) + + val ReciprocalLabels: Set[Feature.Binary] = Set( + RecapFeatures.IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR, + RecapFeatures.IS_REPLIED_REPLY_REPLIED_BY_AUTHOR, + RecapFeatures.IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR + ) + + val NegativeEngagementLabels: Set[Feature.Binary] = Set( + RecapFeatures.IS_REPORT_TWEET_CLICKED, + RecapFeatures.IS_BLOCK_CLICKED, + RecapFeatures.IS_MUTE_CLICKED, + RecapFeatures.IS_DONT_LIKE + ) + + val GoodClickLabels: Set[Feature.Binary] = Set( + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V1, + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V2, + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/RectweetUserFeatureAggregation.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/RectweetUserFeatureAggregation.scala new file mode 100644 index 000000000..12835ef1f --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/RectweetUserFeatureAggregation.scala @@ -0,0 +1,52 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.Feature +import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures +import com.twitter.timelines.prediction.features.itl.ITLFeatures + +object RectweetUserFeatureAggregation { + val RectweetLabelsForAggregation: Set[Feature.Binary] = + Set( + ITLFeatures.IS_FAVORITED, + ITLFeatures.IS_RETWEETED, + ITLFeatures.IS_REPLIED, + ITLFeatures.IS_CLICKED, + ITLFeatures.IS_PROFILE_CLICKED, + ITLFeatures.IS_OPEN_LINKED, + ITLFeatures.IS_PHOTO_EXPANDED, + ITLFeatures.IS_VIDEO_PLAYBACK_50 + ) + + val TweetFeatures: Set[Feature[_]] = Set( + ITLFeatures.HAS_IMAGE, + ITLFeatures.HAS_CARD, + ITLFeatures.HAS_NEWS, + ITLFeatures.REPLY_COUNT, + ITLFeatures.FAV_COUNT, + ITLFeatures.REPLY_COUNT, + ITLFeatures.RETWEET_COUNT, + ITLFeatures.MATCHES_UI_LANG, + ITLFeatures.MATCHES_SEARCHER_MAIN_LANG, + ITLFeatures.MATCHES_SEARCHER_LANGS, + ITLFeatures.TEXT_SCORE, + ITLFeatures.LINK_LANGUAGE, + ITLFeatures.NUM_HASHTAGS, + ITLFeatures.NUM_MENTIONS, + ITLFeatures.IS_SENSITIVE, + ITLFeatures.HAS_VIDEO, + ITLFeatures.HAS_LINK, + ITLFeatures.HAS_VISIBLE_LINK, + EngagementDataRecordFeatures.InNetworkFavoritesCount + // nice to have, but currently not hydrated in the RecommendedTweet payload + //EngagementDataRecordFeatures.InNetworkRetweetsCount, + //EngagementDataRecordFeatures.InNetworkRepliesCount + ) + + val ReciprocalLabels: Set[Feature.Binary] = Set( + ITLFeatures.IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR, + ITLFeatures.IS_REPLIED_REPLY_REPLIED_BY_AUTHOR, + ITLFeatures.IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR, + ITLFeatures.IS_REPLIED_REPLY_RETWEETED_BY_AUTHOR, + ITLFeatures.IS_REPLIED_REPLY_QUOTED_BY_AUTHOR + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfig.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfig.scala new file mode 100644 index 000000000..e6581e32e --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfig.scala @@ -0,0 +1,80 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.dal.client.dataset.KeyValDALDataset +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.FeatureContext +import com.twitter.scalding_internal.multiformat.format.keyval +import com.twitter.summingbird.batch.BatchID +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion.CombineCountsPolicy +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateStore +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.OfflineAggregateDataRecordStore +import scala.collection.JavaConverters._ + +object TimelinesAggregationConfig extends TimelinesAggregationConfigTrait { + override def outputHdfsPath: String = "/user/timelines/processed/aggregates_v2" + + def storeToDatasetMap: Map[String, KeyValDALDataset[ + keyval.KeyVal[AggregationKey, (BatchID, DataRecord)] + ]] = Map( + AuthorTopicAggregateStore -> AuthorTopicAggregatesScalaDataset, + UserTopicAggregateStore -> UserTopicAggregatesScalaDataset, + UserInferredTopicAggregateStore -> UserInferredTopicAggregatesScalaDataset, + UserAggregateStore -> UserAggregatesScalaDataset, + UserAuthorAggregateStore -> UserAuthorAggregatesScalaDataset, + UserOriginalAuthorAggregateStore -> UserOriginalAuthorAggregatesScalaDataset, + OriginalAuthorAggregateStore -> OriginalAuthorAggregatesScalaDataset, + UserEngagerAggregateStore -> UserEngagerAggregatesScalaDataset, + UserMentionAggregateStore -> UserMentionAggregatesScalaDataset, + TwitterWideUserAggregateStore -> TwitterWideUserAggregatesScalaDataset, + TwitterWideUserAuthorAggregateStore -> TwitterWideUserAuthorAggregatesScalaDataset, + UserRequestHourAggregateStore -> UserRequestHourAggregatesScalaDataset, + UserRequestDowAggregateStore -> UserRequestDowAggregatesScalaDataset, + UserListAggregateStore -> UserListAggregatesScalaDataset, + UserMediaUnderstandingAnnotationAggregateStore -> UserMediaUnderstandingAnnotationAggregatesScalaDataset, + ) + + override def mkPhysicalStore(store: AggregateStore): AggregateStore = store match { + case s: OfflineAggregateDataRecordStore => + s.toOfflineAggregateDataRecordStoreWithDAL(storeToDatasetMap(s.name)) + case _ => throw new IllegalArgumentException("Unsupported logical dataset type.") + } + + object CombineCountPolicies { + val EngagerCountsPolicy: CombineCountsPolicy = mkCountsPolicy("user_engager_aggregate") + val EngagerGoodClickCountsPolicy: CombineCountsPolicy = mkCountsPolicy( + "user_engager_good_click_aggregate") + val RectweetEngagerCountsPolicy: CombineCountsPolicy = + mkCountsPolicy("rectweet_user_engager_aggregate") + val MentionCountsPolicy: CombineCountsPolicy = mkCountsPolicy("user_mention_aggregate") + val RectweetSimclustersTweetCountsPolicy: CombineCountsPolicy = + mkCountsPolicy("rectweet_user_simcluster_tweet_aggregate") + val UserInferredTopicCountsPolicy: CombineCountsPolicy = + mkCountsPolicy("user_inferred_topic_aggregate") + val UserInferredTopicV2CountsPolicy: CombineCountsPolicy = + mkCountsPolicy("user_inferred_topic_aggregate_v2") + val UserMediaUnderstandingAnnotationCountsPolicy: CombineCountsPolicy = + mkCountsPolicy("user_media_annotation_aggregate") + + private[this] def mkCountsPolicy(prefix: String): CombineCountsPolicy = { + val features = TimelinesAggregationConfig.aggregatesToCompute + .filter(_.aggregatePrefix == prefix) + .flatMap(_.allOutputFeatures) + CombineCountsPolicy( + topK = 2, + aggregateContextToPrecompute = new FeatureContext(features.asJava), + hardLimit = Some(20) + ) + } + } +} + +object TimelinesAggregationCanaryConfig extends TimelinesAggregationConfigTrait { + override def outputHdfsPath: String = "/user/timelines/canaries/processed/aggregates_v2" + + override def mkPhysicalStore(store: AggregateStore): AggregateStore = store match { + case s: OfflineAggregateDataRecordStore => + s.toOfflineAggregateDataRecordStoreWithDAL(dalDataset = AggregatesCanaryScalaDataset) + case _ => throw new IllegalArgumentException("Unsupported logical dataset type.") + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigDetails.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigDetails.scala new file mode 100644 index 000000000..aa439deda --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigDetails.scala @@ -0,0 +1,579 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.conversions.DurationOps._ +import com.twitter.ml.api.constant.SharedFeatures.AUTHOR_ID +import com.twitter.ml.api.constant.SharedFeatures.USER_ID +import com.twitter.timelines.data_processing.ml_util.aggregation_framework._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics._ +import com.twitter.timelines.data_processing.ml_util.transforms.DownsampleTransform +import com.twitter.timelines.data_processing.ml_util.transforms.RichRemoveAuthorIdZero +import com.twitter.timelines.data_processing.ml_util.transforms.RichRemoveUserIdZero +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures +import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures.RichUnifyPublicEngagersTransform +import com.twitter.timelines.prediction.features.list_features.ListFeatures +import com.twitter.timelines.prediction.features.recap.RecapFeatures +import com.twitter.timelines.prediction.features.request_context.RequestContextFeatures +import com.twitter.timelines.prediction.features.semantic_core_features.SemanticCoreFeatures +import com.twitter.timelines.prediction.transform.filter.FilterInNetworkTransform +import com.twitter.timelines.prediction.transform.filter.FilterImageTweetTransform +import com.twitter.timelines.prediction.transform.filter.FilterVideoTweetTransform +import com.twitter.timelines.prediction.transform.filter.FilterOutImageVideoTweetTransform +import com.twitter.util.Duration + +trait TimelinesAggregationConfigDetails extends Serializable { + + import TimelinesAggregationSources._ + + def outputHdfsPath: String + + /** + * Converts the given logical store to a physical store. The reason we do not specify the + * physical store directly with the [[AggregateGroup]] is because of a cyclic dependency when + * create physical stores that are DalDataset with PersonalDataType annotations derived from + * the [[AggregateGroup]]. + * + */ + def mkPhysicalStore(store: AggregateStore): AggregateStore + + def defaultMaxKvSourceFailures: Int = 100 + + val timelinesOfflineAggregateSink = new OfflineStoreCommonConfig { + override def apply(startDate: String) = OfflineAggregateStoreCommonConfig( + outputHdfsPathPrefix = outputHdfsPath, + dummyAppId = "timelines_aggregates_v2_ro", + dummyDatasetPrefix = "timelines_aggregates_v2_ro", + startDate = startDate + ) + } + + val UserAggregateStore = "user_aggregates" + val UserAuthorAggregateStore = "user_author_aggregates" + val UserOriginalAuthorAggregateStore = "user_original_author_aggregates" + val OriginalAuthorAggregateStore = "original_author_aggregates" + val UserEngagerAggregateStore = "user_engager_aggregates" + val UserMentionAggregateStore = "user_mention_aggregates" + val TwitterWideUserAggregateStore = "twitter_wide_user_aggregates" + val TwitterWideUserAuthorAggregateStore = "twitter_wide_user_author_aggregates" + val UserRequestHourAggregateStore = "user_request_hour_aggregates" + val UserRequestDowAggregateStore = "user_request_dow_aggregates" + val UserListAggregateStore = "user_list_aggregates" + val AuthorTopicAggregateStore = "author_topic_aggregates" + val UserTopicAggregateStore = "user_topic_aggregates" + val UserInferredTopicAggregateStore = "user_inferred_topic_aggregates" + val UserMediaUnderstandingAnnotationAggregateStore = + "user_media_understanding_annotation_aggregates" + val AuthorCountryCodeAggregateStore = "author_country_code_aggregates" + val OriginalAuthorCountryCodeAggregateStore = "original_author_country_code_aggregates" + + /** + * Step 3: Configure all aggregates to compute. + * Note that different subsets of aggregates in this list + * can be launched by different summingbird job instances. + * Any given job can be responsible for a set of AggregateGroup + * configs whose outputStores share the same exact startDate. + * AggregateGroups that do not share the same inputSource, + * outputStore or startDate MUST be launched using different + * summingbird jobs and passed in a different --start-time argument + * See science/scalding/mesos/timelines/prod.yaml for an example + * of how to configure your own job. + */ + val negativeDownsampleTransform = + DownsampleTransform( + negativeSamplingRate = 0.03, + keepLabels = RecapUserFeatureAggregation.LabelsV2) + val negativeRecTweetDownsampleTransform = DownsampleTransform( + negativeSamplingRate = 0.03, + keepLabels = RectweetUserFeatureAggregation.RectweetLabelsForAggregation + ) + + val userAggregatesV2: AggregateGroup = + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_aggregate_v2", + preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */ + keys = Set(USER_ID), + features = RecapUserFeatureAggregation.UserFeaturesV2, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric, SumMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userAuthorAggregatesV2: Set[AggregateGroup] = { + + /** + * NOTE: We need to remove records from out-of-network authors from the recap input + * records (which now include out-of-network records as well after merging recap and + * rectweet models) that are used to compute user-author aggregates. This is necessary + * to limit the growth rate of user-author aggregates. + */ + val allFeatureAggregates = Set( + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_author_aggregate_v2", + preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero), + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.UserAuthorFeaturesV2, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(SumMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAuthorAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + ) + + val countAggregates: Set[AggregateGroup] = Set( + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_author_aggregate_v2", + preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero), + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.UserAuthorFeaturesV2Count, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAuthorAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + ) + + allFeatureAggregates ++ countAggregates + } + + val userAggregatesV5Continuous: AggregateGroup = + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_aggregate_v5.continuous", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID), + features = RecapUserFeatureAggregation.UserFeaturesV5Continuous, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric, SumMetric, SumSqMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userAuthorAggregatesV5: AggregateGroup = + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_author_aggregate_v5", + preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero), + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.UserAuthorFeaturesV5, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAuthorAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val tweetSourceUserAuthorAggregatesV1: AggregateGroup = + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_author_aggregate_tweetsource_v1", + preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero), + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.UserAuthorTweetSourceFeaturesV1, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric, SumMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAuthorAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userEngagerAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_engager_aggregate", + keys = Set(USER_ID, EngagementDataRecordFeatures.PublicEngagementUserIds), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserEngagerAggregateStore, + startDate = "2016-09-02 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + preTransforms = Seq( + RichRemoveUserIdZero, + RichUnifyPublicEngagersTransform + ) + ) + + val userMentionAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */ + aggregatePrefix = "user_mention_aggregate", + keys = Set(USER_ID, RecapFeatures.MENTIONED_SCREEN_NAMES), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserMentionAggregateStore, + startDate = "2017-03-01 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + val twitterWideUserAggregates = AggregateGroup( + inputSource = timelinesDailyTwitterWideSource, + preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */ + aggregatePrefix = "twitter_wide_user_aggregate", + keys = Set(USER_ID), + features = RecapUserFeatureAggregation.TwitterWideFeatures, + labels = RecapUserFeatureAggregation.TwitterWideLabels, + metrics = Set(CountMetric, SumMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = TwitterWideUserAggregateStore, + startDate = "2016-12-28 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val twitterWideUserAuthorAggregates = AggregateGroup( + inputSource = timelinesDailyTwitterWideSource, + preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */ + aggregatePrefix = "twitter_wide_user_author_aggregate", + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.TwitterWideFeatures, + labels = RecapUserFeatureAggregation.TwitterWideLabels, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = TwitterWideUserAuthorAggregateStore, + startDate = "2016-12-28 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + /** + * User-HourOfDay and User-DayOfWeek aggregations, both for recap and rectweet + */ + val userRequestHourAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_request_context_aggregate.hour", + preTransforms = Seq(RichRemoveUserIdZero, negativeDownsampleTransform), + keys = Set(USER_ID, RequestContextFeatures.TIMESTAMP_GMT_HOUR), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserRequestHourAggregateStore, + startDate = "2017-08-01 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userRequestDowAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_request_context_aggregate.dow", + preTransforms = Seq(RichRemoveUserIdZero, negativeDownsampleTransform), + keys = Set(USER_ID, RequestContextFeatures.TIMESTAMP_GMT_DOW), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserRequestDowAggregateStore, + startDate = "2017-08-01 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val authorTopicAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "author_topic_aggregate", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(AUTHOR_ID, TimelinesSharedFeatures.TOPIC_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = AuthorTopicAggregateStore, + startDate = "2020-05-19 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userTopicAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_topic_aggregate", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID, TimelinesSharedFeatures.TOPIC_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserTopicAggregateStore, + startDate = "2020-05-23 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userTopicAggregatesV2 = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_topic_aggregate_v2", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID, TimelinesSharedFeatures.TOPIC_ID), + features = RecapUserFeatureAggregation.UserTopicFeaturesV2Count, + labels = RecapUserFeatureAggregation.LabelsV2, + includeAnyFeature = false, + includeAnyLabel = false, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserTopicAggregateStore, + startDate = "2020-05-23 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userInferredTopicAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_inferred_topic_aggregate", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID, TimelinesSharedFeatures.INFERRED_TOPIC_IDS), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserInferredTopicAggregateStore, + startDate = "2020-09-09 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userInferredTopicAggregatesV2 = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_inferred_topic_aggregate_v2", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID, TimelinesSharedFeatures.INFERRED_TOPIC_IDS), + features = RecapUserFeatureAggregation.UserTopicFeaturesV2Count, + labels = RecapUserFeatureAggregation.LabelsV2, + includeAnyFeature = false, + includeAnyLabel = false, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserInferredTopicAggregateStore, + startDate = "2020-09-09 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userReciprocalEngagementAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_aggregate_v6", + preTransforms = Seq(RichRemoveUserIdZero), + keys = Set(USER_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.ReciprocalLabels, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + val userOriginalAuthorReciprocalEngagementAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_original_author_aggregate_v1", + preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero), + keys = Set(USER_ID, TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.ReciprocalLabels, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserOriginalAuthorAggregateStore, + startDate = "2018-12-26 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + val originalAuthorReciprocalEngagementAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "original_author_aggregate_v1", + preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero), + keys = Set(TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.ReciprocalLabels, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = OriginalAuthorAggregateStore, + startDate = "2023-02-25 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + val originalAuthorNegativeEngagementAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "original_author_aggregate_v2", + preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero), + keys = Set(TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.NegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = OriginalAuthorAggregateStore, + startDate = "2023-02-25 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + includeAnyLabel = false + ) + + val userListAggregates: AggregateGroup = + AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_list_aggregate", + keys = Set(USER_ID, ListFeatures.LIST_ID), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserListAggregateStore, + startDate = "2020-05-28 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + preTransforms = Seq(RichRemoveUserIdZero) + ) + + val userMediaUnderstandingAnnotationAggregates: AggregateGroup = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_media_annotation_aggregate", + preTransforms = Seq(RichRemoveUserIdZero), + keys = + Set(USER_ID, SemanticCoreFeatures.mediaUnderstandingHighRecallNonSensitiveEntityIdsFeature), + features = Set.empty, + labels = RecapUserFeatureAggregation.LabelsV2, + metrics = Set(CountMetric), + halfLives = Set(50.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserMediaUnderstandingAnnotationAggregateStore, + startDate = "2021-03-20 00:00", + commonConfig = timelinesOfflineAggregateSink + )) + ) + + val userAuthorGoodClickAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_author_good_click_aggregate", + preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero), + keys = Set(USER_ID, AUTHOR_ID), + features = RecapUserFeatureAggregation.UserAuthorFeaturesV2, + labels = RecapUserFeatureAggregation.GoodClickLabels, + metrics = Set(SumMetric), + halfLives = Set(14.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserAuthorAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )) + ) + + val userEngagerGoodClickAggregates = AggregateGroup( + inputSource = timelinesDailyRecapMinimalSource, + aggregatePrefix = "user_engager_good_click_aggregate", + keys = Set(USER_ID, EngagementDataRecordFeatures.PublicEngagementUserIds), + features = Set.empty, + labels = RecapUserFeatureAggregation.GoodClickLabels, + metrics = Set(CountMetric), + halfLives = Set(14.days), + outputStore = mkPhysicalStore( + OfflineAggregateDataRecordStore( + name = UserEngagerAggregateStore, + startDate = "2016-09-02 00:00", + commonConfig = timelinesOfflineAggregateSink, + maxKvSourceFailures = defaultMaxKvSourceFailures + )), + preTransforms = Seq( + RichRemoveUserIdZero, + RichUnifyPublicEngagersTransform + ) + ) + +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigTrait.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigTrait.scala new file mode 100644 index 000000000..6fb2e07b7 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationConfigTrait.scala @@ -0,0 +1,50 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationConfig +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateGroup +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup + +trait TimelinesAggregationConfigTrait + extends TimelinesAggregationConfigDetails + with AggregationConfig { + private val aggregateGroups = Set( + authorTopicAggregates, + userTopicAggregates, + userTopicAggregatesV2, + userInferredTopicAggregates, + userInferredTopicAggregatesV2, + userAggregatesV2, + userAggregatesV5Continuous, + userReciprocalEngagementAggregates, + userAuthorAggregatesV5, + userOriginalAuthorReciprocalEngagementAggregates, + originalAuthorReciprocalEngagementAggregates, + tweetSourceUserAuthorAggregatesV1, + userEngagerAggregates, + userMentionAggregates, + twitterWideUserAggregates, + twitterWideUserAuthorAggregates, + userRequestHourAggregates, + userRequestDowAggregates, + userListAggregates, + userMediaUnderstandingAnnotationAggregates, + ) ++ userAuthorAggregatesV2 + + val aggregatesToComputeList: Set[List[TypedAggregateGroup[_]]] = + aggregateGroups.map(_.buildTypedAggregateGroups()) + + override val aggregatesToCompute: Set[TypedAggregateGroup[_]] = aggregatesToComputeList.flatten + + /* + * Feature selection config to save storage space and manhattan query bandwidth. + * Only the most important features found using offline RCE simulations are used + * when actually training and serving. This selector is used by + * [[com.twitter.timelines.data_processing.jobs.timeline_ranking_user_features.TimelineRankingAggregatesV2FeaturesProdJob]] + * but defined here to keep it in sync with the config that computes the aggregates. + */ + val AggregatesV2FeatureSelector = FeatureSelectorConfig.AggregatesV2ProdFeatureSelector + + def filterAggregatesGroups(storeNames: Set[String]): Set[AggregateGroup] = { + aggregateGroups.filter(aggregateGroup => storeNames.contains(aggregateGroup.outputStore.name)) + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationKeyValInjections.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationKeyValInjections.scala new file mode 100644 index 000000000..1f2433b53 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationKeyValInjections.scala @@ -0,0 +1,48 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.DataRecord +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection +import com.twitter.summingbird.batch.BatchID +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.{ + AggregateStore, + AggregationKey, + OfflineAggregateInjections, + TypedAggregateGroup +} + +object TimelinesAggregationKeyValInjections extends TimelinesAggregationConfigTrait { + + import OfflineAggregateInjections.getInjection + + type KVInjection = KeyValInjection[AggregationKey, (BatchID, DataRecord)] + + val AuthorTopic: KVInjection = getInjection(filter(AuthorTopicAggregateStore)) + val UserTopic: KVInjection = getInjection(filter(UserTopicAggregateStore)) + val UserInferredTopic: KVInjection = getInjection(filter(UserInferredTopicAggregateStore)) + val User: KVInjection = getInjection(filter(UserAggregateStore)) + val UserAuthor: KVInjection = getInjection(filter(UserAuthorAggregateStore)) + val UserOriginalAuthor: KVInjection = getInjection(filter(UserOriginalAuthorAggregateStore)) + val OriginalAuthor: KVInjection = getInjection(filter(OriginalAuthorAggregateStore)) + val UserEngager: KVInjection = getInjection(filter(UserEngagerAggregateStore)) + val UserMention: KVInjection = getInjection(filter(UserMentionAggregateStore)) + val TwitterWideUser: KVInjection = getInjection(filter(TwitterWideUserAggregateStore)) + val TwitterWideUserAuthor: KVInjection = getInjection(filter(TwitterWideUserAuthorAggregateStore)) + val UserRequestHour: KVInjection = getInjection(filter(UserRequestHourAggregateStore)) + val UserRequestDow: KVInjection = getInjection(filter(UserRequestDowAggregateStore)) + val UserList: KVInjection = getInjection(filter(UserListAggregateStore)) + val UserMediaUnderstandingAnnotation: KVInjection = getInjection( + filter(UserMediaUnderstandingAnnotationAggregateStore)) + + private def filter(storeName: String): Set[TypedAggregateGroup[_]] = { + val groups = aggregatesToCompute.filter(_.outputStore.name == storeName) + require(groups.nonEmpty) + groups + } + + override def outputHdfsPath: String = "/user/timelines/processed/aggregates_v2" + + // Since this object is not used to execute any online or offline aggregates job, but is meant + // to store all PDT enabled KeyValInjections, we do not need to construct a physical store. + // We use the identity operation as a default. + override def mkPhysicalStore(store: AggregateStore): AggregateStore = store +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationSources.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationSources.scala new file mode 100644 index 000000000..c799f22fa --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/TimelinesAggregationSources.scala @@ -0,0 +1,45 @@ +package com.twitter.timelines.prediction.common.aggregates + +import com.twitter.ml.api.constant.SharedFeatures.TIMESTAMP +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.OfflineAggregateSource +import com.twitter.timelines.prediction.features.p_home_latest.HomeLatestUserAggregatesFeatures +import timelines.data_processing.ad_hoc.recap.data_record_preparation.RecapDataRecordsAggMinimalJavaDataset + +/** + * Any update here should be in sync with [[TimelinesFeatureGroups]] and [[AggMinimalDataRecordGeneratorJob]]. + */ +object TimelinesAggregationSources { + + /** + * This is the recap data records after post-processing in [[GenerateRecapAggMinimalDataRecordsJob]] + */ + val timelinesDailyRecapMinimalSource = OfflineAggregateSource( + name = "timelines_daily_recap", + timestampFeature = TIMESTAMP, + dalDataSet = Some(RecapDataRecordsAggMinimalJavaDataset), + scaldingSuffixType = Some("dal"), + withValidation = true + ) + val timelinesDailyTwitterWideSource = OfflineAggregateSource( + name = "timelines_daily_twitter_wide", + timestampFeature = TIMESTAMP, + scaldingHdfsPath = Some("/user/timelines/processed/suggests/recap/twitter_wide_data_records"), + scaldingSuffixType = Some("daily"), + withValidation = true + ) + + val timelinesDailyListTimelineSource = OfflineAggregateSource( + name = "timelines_daily_list_timeline", + timestampFeature = TIMESTAMP, + scaldingHdfsPath = Some("/user/timelines/processed/suggests/recap/all_features/list"), + scaldingSuffixType = Some("hourly"), + withValidation = true + ) + + val timelinesDailyHomeLatestSource = OfflineAggregateSource( + name = "timelines_daily_home_latest", + timestampFeature = HomeLatestUserAggregatesFeatures.AGGREGATE_TIMESTAMP_MS, + scaldingHdfsPath = Some("/user/timelines/processed/p_home_latest/user_aggregates"), + scaldingSuffixType = Some("daily") + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/AuthorFeaturesAdapter.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/AuthorFeaturesAdapter.scala new file mode 100644 index 000000000..7cefc67b9 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/AuthorFeaturesAdapter.scala @@ -0,0 +1,70 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType.UserState +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.{DataRecord, Feature, FeatureContext, RichDataRecord} +import com.twitter.ml.featurestore.catalog.entities.core.Author +import com.twitter.ml.featurestore.catalog.features.magicrecs.UserActivity +import com.twitter.ml.featurestore.lib.data.PredictionRecord +import com.twitter.ml.featurestore.lib.feature.{BoundFeature, BoundFeatureSet} +import com.twitter.ml.featurestore.lib.{UserId, Discrete => FSDiscrete} +import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase +import java.lang.{Boolean => JBoolean} +import java.util +import scala.collection.JavaConverters._ + +object AuthorFeaturesAdapter extends TimelinesAdapterBase[PredictionRecord] { + val UserStateBoundFeature: BoundFeature[UserId, FSDiscrete] = UserActivity.UserState.bind(Author) + val UserFeaturesSet: BoundFeatureSet = BoundFeatureSet(UserStateBoundFeature) + + /** + * Boolean features about viewer's user state. + * enum UserState { + * NEW = 0, + * NEAR_ZERO = 1, + * VERY_LIGHT = 2, + * LIGHT = 3, + * MEDIUM_TWEETER = 4, + * MEDIUM_NON_TWEETER = 5, + * HEAVY_NON_TWEETER = 6, + * HEAVY_TWEETER = 7 + * }(persisted='true') + */ + val IS_USER_NEW = new Binary("timelines.author.user_state.is_user_new", Set(UserState).asJava) + val IS_USER_LIGHT = new Binary("timelines.author.user_state.is_user_light", Set(UserState).asJava) + val IS_USER_MEDIUM_TWEETER = + new Binary("timelines.author.user_state.is_user_medium_tweeter", Set(UserState).asJava) + val IS_USER_MEDIUM_NON_TWEETER = + new Binary("timelines.author.user_state.is_user_medium_non_tweeter", Set(UserState).asJava) + val IS_USER_HEAVY_NON_TWEETER = + new Binary("timelines.author.user_state.is_user_heavy_non_tweeter", Set(UserState).asJava) + val IS_USER_HEAVY_TWEETER = + new Binary("timelines.author.user_state.is_user_heavy_tweeter", Set(UserState).asJava) + val userStateToFeatureMap: Map[Long, Binary] = Map( + 0L -> IS_USER_NEW, + 1L -> IS_USER_LIGHT, + 2L -> IS_USER_LIGHT, + 3L -> IS_USER_LIGHT, + 4L -> IS_USER_MEDIUM_TWEETER, + 5L -> IS_USER_MEDIUM_NON_TWEETER, + 6L -> IS_USER_HEAVY_NON_TWEETER, + 7L -> IS_USER_HEAVY_TWEETER + ) + + val UserStateBooleanFeatures: Set[Feature[_]] = userStateToFeatureMap.values.toSet + + private val allFeatures: Seq[Feature[_]] = UserStateBooleanFeatures.toSeq + override def getFeatureContext: FeatureContext = new FeatureContext(allFeatures: _*) + override def commonFeatures: Set[Feature[_]] = Set.empty + + override def adaptToDataRecords(record: PredictionRecord): util.List[DataRecord] = { + val newRecord = new RichDataRecord(new DataRecord) + record + .getFeatureValue(UserStateBoundFeature) + .flatMap { userState => userStateToFeatureMap.get(userState.value) }.foreach { + booleanFeature => newRecord.setFeatureValue[JBoolean](booleanFeature, true) + } + + List(newRecord.getRecord).asJava + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/BUILD b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/BUILD new file mode 100644 index 000000000..93f39405d --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/BUILD @@ -0,0 +1,199 @@ +heron_binary( + name = "heron-without-jass", + main = "com.twitter.timelines.prediction.common.aggregates.real_time.TypeSafeRunner", + oss = True, + platform = "java8", + runtime_platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + ":real_time", + "3rdparty/jvm/org/slf4j:slf4j-jdk14", + ], +) + +jvm_app( + name = "rta_heron", + binary = ":heron-without-jass", + bundles = [ + bundle( + fileset = ["resources/jaas.conf"], + ), + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], +) + +scala_library( + sources = ["*.scala"], + platform = "java8", + strict_deps = False, + tags = ["bazel-compatible"], + dependencies = [ + ":online-configs", + "3rdparty/src/jvm/com/twitter/summingbird:storm", + "src/java/com/twitter/heron/util", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core:core-features", + "src/scala/com/twitter/ml/api/util", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + "src/scala/com/twitter/summingbird_internal/runner/storm", + "src/scala/com/twitter/summingbird_internal/sources/storm/remote:ClientEventSourceScrooge2", + "src/scala/com/twitter/timelines/prediction/adapters/client_log_event", + "src/scala/com/twitter/timelines/prediction/adapters/client_log_event_mr", + "src/scala/com/twitter/timelines/prediction/features/client_log_event", + "src/scala/com/twitter/timelines/prediction/features/common", + "src/scala/com/twitter/timelines/prediction/features/list_features", + "src/scala/com/twitter/timelines/prediction/features/recap", + "src/scala/com/twitter/timelines/prediction/features/user_health", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/timelines/suggests/common:record-scala", + "timelinemixer/common/src/main/scala/com/twitter/timelinemixer/clients/served_features_cache", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/heron", + "timelines/data_processing/ml_util/aggregation_framework/job", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "timelines/data_processing/ml_util/transforms", + "timelines/src/main/scala/com/twitter/timelines/clients/memcache_common", + "util/util-core:scala", + ], +) + +scala_library( + name = "online-configs", + sources = [ + "AuthorFeaturesAdapter.scala", + "Event.scala", + "FeatureStoreUtils.scala", + "StormAggregateSourceUtils.scala", + "TimelinesOnlineAggregationConfig.scala", + "TimelinesOnlineAggregationConfigBase.scala", + "TimelinesOnlineAggregationSources.scala", + "TimelinesStormAggregateSource.scala", + "TweetFeaturesReadableStore.scala", + "UserFeaturesAdapter.scala", + "UserFeaturesReadableStore.scala", + ], + platform = "java8", + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + ":base-config", + "3rdparty/src/jvm/com/twitter/scalding:db", + "3rdparty/src/jvm/com/twitter/storehaus:core", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "3rdparty/src/jvm/com/twitter/summingbird:online", + "3rdparty/src/jvm/com/twitter/summingbird:storm", + "abuse/detection/src/main/thrift/com/twitter/abuse/detection/mention_interactions:thrift-scala", + "snowflake/src/main/scala/com/twitter/snowflake/id", + "snowflake/src/main/thrift:thrift-scala", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core:core-features", + "src/scala/com/twitter/ml/api/util:datarecord", + "src/scala/com/twitter/ml/featurestore/catalog/datasets/geo:geo-user-location", + "src/scala/com/twitter/ml/featurestore/catalog/datasets/magicrecs:user-features", + "src/scala/com/twitter/ml/featurestore/catalog/entities/core", + "src/scala/com/twitter/ml/featurestore/catalog/features/core:user", + "src/scala/com/twitter/ml/featurestore/catalog/features/geo", + "src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-activity", + "src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-info", + "src/scala/com/twitter/ml/featurestore/catalog/features/trends:tweet_trends_scores", + "src/scala/com/twitter/ml/featurestore/lib/data", + "src/scala/com/twitter/ml/featurestore/lib/dataset/offline", + "src/scala/com/twitter/ml/featurestore/lib/export/strato:app-names", + "src/scala/com/twitter/ml/featurestore/lib/feature", + "src/scala/com/twitter/ml/featurestore/lib/online", + "src/scala/com/twitter/ml/featurestore/lib/params", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + "src/scala/com/twitter/summingbird_internal/runner/storm", + "src/scala/com/twitter/summingbird_internal/sources/common", + "src/scala/com/twitter/summingbird_internal/sources/common/remote:ClientEventSourceScrooge", + "src/scala/com/twitter/summingbird_internal/sources/storm/remote:ClientEventSourceScrooge2", + "src/scala/com/twitter/timelines/prediction/adapters/client_log_event", + "src/scala/com/twitter/timelines/prediction/adapters/client_log_event_mr", + "src/scala/com/twitter/timelines/prediction/common/adapters:base", + "src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter", + "src/scala/com/twitter/timelines/prediction/common/aggregates", + "src/scala/com/twitter/timelines/prediction/features/client_log_event", + "src/scala/com/twitter/timelines/prediction/features/common", + "src/scala/com/twitter/timelines/prediction/features/list_features", + "src/scala/com/twitter/timelines/prediction/features/recap", + "src/scala/com/twitter/timelines/prediction/features/user_health", + "src/thrift/com/twitter/clientapp/gen:clientapp-scala", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/timelines/suggests/common:engagement-java", + "src/thrift/com/twitter/timelines/suggests/common:engagement-scala", + "src/thrift/com/twitter/timelines/suggests/common:record-scala", + "src/thrift/com/twitter/timelineservice/injection:thrift-scala", + "src/thrift/com/twitter/timelineservice/server/suggests/logging:thrift-scala", + "strato/src/main/scala/com/twitter/strato/client", + "timelinemixer/common/src/main/scala/com/twitter/timelinemixer/clients/served_features_cache", + "timelines/data_processing/ad_hoc/suggests/common:raw_training_data_creator", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/heron:configs", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "timelines/data_processing/ml_util/transforms", + "timelines/data_processing/util:rich-request", + "tweetsource/common/src/main/thrift:thrift-scala", + "twitter-server-internal/src/main/scala", + "unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config", + "unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "base-config", + sources = [ + "AuthorFeaturesAdapter.scala", + "TimelinesOnlineAggregationConfigBase.scala", + "TweetFeaturesAdapter.scala", + "UserFeaturesAdapter.scala", + ], + platform = "java8", + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/resources/com/twitter/timelines/prediction/common/aggregates/real_time", + "src/scala/com/twitter/ml/api/util:datarecord", + "src/scala/com/twitter/ml/featurestore/catalog/datasets/magicrecs:user-features", + "src/scala/com/twitter/ml/featurestore/catalog/entities/core", + "src/scala/com/twitter/ml/featurestore/catalog/features/core:user", + "src/scala/com/twitter/ml/featurestore/catalog/features/geo", + "src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-activity", + "src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-info", + "src/scala/com/twitter/ml/featurestore/catalog/features/trends:tweet_trends_scores", + "src/scala/com/twitter/ml/featurestore/lib/data", + "src/scala/com/twitter/ml/featurestore/lib/feature", + "src/scala/com/twitter/timelines/prediction/common/adapters:base", + "src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter", + "src/scala/com/twitter/timelines/prediction/common/aggregates", + "src/scala/com/twitter/timelines/prediction/features/client_log_event", + "src/scala/com/twitter/timelines/prediction/features/common", + "src/scala/com/twitter/timelines/prediction/features/list_features", + "src/scala/com/twitter/timelines/prediction/features/recap", + "src/scala/com/twitter/timelines/prediction/features/user_health", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:feature_context-java", + "src/thrift/com/twitter/timelines/suggests/common:engagement-scala", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/heron:base-config", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "timelines/data_processing/ml_util/transforms", + "util/util-core:scala", + "util/util-core:util-core-util", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/Event.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/Event.scala new file mode 100644 index 000000000..1bd697d0d --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/Event.scala @@ -0,0 +1,11 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +private[real_time] sealed trait Event[T] { def event: T } + +private[real_time] case class HomeEvent[T](override val event: T) extends Event[T] + +private[real_time] case class ProfileEvent[T](override val event: T) extends Event[T] + +private[real_time] case class SearchEvent[T](override val event: T) extends Event[T] + +private[real_time] case class UuaEvent[T](override val event: T) extends Event[T] diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/FeatureStoreUtils.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/FeatureStoreUtils.scala new file mode 100644 index 000000000..156d9d35f --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/FeatureStoreUtils.scala @@ -0,0 +1,53 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.featurestore.catalog.datasets.magicrecs.UserFeaturesDataset +import com.twitter.ml.featurestore.catalog.datasets.geo.GeoUserLocationDataset +import com.twitter.ml.featurestore.lib.dataset.DatasetParams +import com.twitter.ml.featurestore.lib.export.strato.FeatureStoreAppNames +import com.twitter.ml.featurestore.lib.online.FeatureStoreClient +import com.twitter.ml.featurestore.lib.params.FeatureStoreParams +import com.twitter.strato.client.{Client, Strato} +import com.twitter.strato.opcontext.Attribution.ManhattanAppId +import com.twitter.util.Duration + +private[real_time] object FeatureStoreUtils { + private def mkStratoClient(serviceIdentifier: ServiceIdentifier): Client = + Strato.client + .withMutualTls(serviceIdentifier) + .withRequestTimeout(Duration.fromMilliseconds(50)) + .build() + + private val featureStoreParams: FeatureStoreParams = + FeatureStoreParams( + perDataset = Map( + UserFeaturesDataset.id -> + DatasetParams( + stratoSuffix = Some(FeatureStoreAppNames.Timelines), + attributions = Seq(ManhattanAppId("athena", "timelines_aggregates_v2_features_by_user")) + ), + GeoUserLocationDataset.id -> + DatasetParams( + attributions = Seq(ManhattanAppId("starbuck", "timelines_geo_features_by_user")) + ) + ) + ) + + def mkFeatureStoreClient( + serviceIdentifier: ServiceIdentifier, + statsReceiver: StatsReceiver + ): FeatureStoreClient = { + com.twitter.server.Init() // necessary in order to use WilyNS path + + val stratoClient: Client = mkStratoClient(serviceIdentifier) + val featureStoreClient: FeatureStoreClient = FeatureStoreClient( + featureSet = + UserFeaturesAdapter.UserFeaturesSet ++ AuthorFeaturesAdapter.UserFeaturesSet ++ TweetFeaturesAdapter.TweetFeaturesSet, + client = stratoClient, + statsReceiver = statsReceiver, + featureStoreParams = featureStoreParams + ) + featureStoreClient + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/LocallyReplicatedStore.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/LocallyReplicatedStore.scala new file mode 100644 index 000000000..42f86fa4f --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/LocallyReplicatedStore.scala @@ -0,0 +1,79 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.storehaus.ReplicatedReadableStore +import com.twitter.storehaus.Store +import com.twitter.timelines.clients.memcache_common._ +import com.twitter.timelines.util.FailOpenHandler +import com.twitter.util.Future + +object ServedFeaturesMemcacheConfigBuilder { + def getTwCacheDestination(cluster: String, isProd: Boolean = false): String = + if (!isProd) { + s"/srv#/test/$cluster/cache//twemcache_timelines_served_features_cache" + } else { + s"/srv#/prod/$cluster/cache/timelines_served_features" + } + + /** + * @cluster The DC of the cache that this client will send requests to. This + * can be different to the DC where the summingbird job is running in. + * @isProd Define if this client is part of a production summingbird job as + * different accesspoints will need to be chosen. + */ + def build(cluster: String, isProd: Boolean = false): StorehausMemcacheConfig = + StorehausMemcacheConfig( + destName = getTwCacheDestination(cluster, isProd), + keyPrefix = "", + requestTimeout = 200.milliseconds, + numTries = 2, + globalTimeout = 400.milliseconds, + tcpConnectTimeout = 200.milliseconds, + connectionAcquisitionTimeout = 200.milliseconds, + numPendingRequests = 1000, + isReadOnly = false + ) +} + +/** + * If lookup key does not exist locally, make a call to the replicated store(s). + * If value exists remotely, write the first returned value to the local store + * and return it. Map any exceptions to None so that the subsequent operations + * may proceed. + */ +class LocallyReplicatedStore[-K, V]( + localStore: Store[K, V], + remoteStore: ReplicatedReadableStore[K, V], + scopedStatsReceiver: StatsReceiver) + extends Store[K, V] { + private[this] val failOpenHandler = new FailOpenHandler(scopedStatsReceiver.scope("failOpen")) + private[this] val localFailsCounter = scopedStatsReceiver.counter("localFails") + private[this] val localWritesCounter = scopedStatsReceiver.counter("localWrites") + private[this] val remoteFailsCounter = scopedStatsReceiver.counter("remoteFails") + + override def get(k: K): Future[Option[V]] = + failOpenHandler { + localStore + .get(k) + .flatMap { + case Some(v) => Future.value(Some(v)) + case _ => { + localFailsCounter.incr() + val replicatedOptFu = remoteStore.get(k) + // async write if result is not empty + replicatedOptFu.onSuccess { + case Some(v) => { + localWritesCounter.incr() + localStore.put((k, Some(v))) + } + case _ => { + remoteFailsCounter.incr() + Unit + } + } + replicatedOptFu + } + } + } { _: Throwable => Future.None } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/StormAggregateSourceUtils.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/StormAggregateSourceUtils.scala new file mode 100644 index 000000000..e72d3392b --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/StormAggregateSourceUtils.scala @@ -0,0 +1,254 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.DataRecordMerger +import com.twitter.ml.api.Feature +import com.twitter.ml.api.RichDataRecord +import com.twitter.ml.featurestore.catalog.entities.core.Author +import com.twitter.ml.featurestore.catalog.entities.core.Tweet +import com.twitter.ml.featurestore.catalog.entities.core.User +import com.twitter.ml.featurestore.lib.online.FeatureStoreClient +import com.twitter.summingbird.Producer +import com.twitter.summingbird.storm.Storm +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.RealTimeAggregatesJobConfig +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import java.lang.{Long => JLong} + +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +private[real_time] object StormAggregateSourceUtils { + type UserId = Long + type AuthorId = Long + type TweetId = Long + + /** + * Attaches a [[FeatureStoreClient]] to the underyling [[Producer]]. The FeatureStoreClient + * hydrates additional user features. + * + * @param underlyingProducer converts a stream of [[com.twitter.clientapp.thriftscala.LogEvent]] + * to a stream of [[DataRecord]]. + */ + def wrapByFeatureStoreClient( + underlyingProducer: Producer[Storm, Event[DataRecord]], + jobConfig: RealTimeAggregatesJobConfig, + scopedStatsReceiver: StatsReceiver + ): Producer[Storm, Event[DataRecord]] = { + lazy val keyDataRecordCounter = scopedStatsReceiver.counter("keyDataRecord") + lazy val keyFeatureCounter = scopedStatsReceiver.counter("keyFeature") + lazy val leftDataRecordCounter = scopedStatsReceiver.counter("leftDataRecord") + lazy val rightDataRecordCounter = scopedStatsReceiver.counter("rightDataRecord") + lazy val mergeNumFeaturesCounter = scopedStatsReceiver.counter("mergeNumFeatures") + lazy val authorKeyDataRecordCounter = scopedStatsReceiver.counter("authorKeyDataRecord") + lazy val authorKeyFeatureCounter = scopedStatsReceiver.counter("authorKeyFeature") + lazy val authorLeftDataRecordCounter = scopedStatsReceiver.counter("authorLeftDataRecord") + lazy val authorRightDataRecordCounter = scopedStatsReceiver.counter("authorRightDataRecord") + lazy val authorMergeNumFeaturesCounter = scopedStatsReceiver.counter("authorMergeNumFeatures") + lazy val tweetKeyDataRecordCounter = + scopedStatsReceiver.counter("tweetKeyDataRecord") + lazy val tweetKeyFeatureCounter = scopedStatsReceiver.counter("tweetKeyFeature") + lazy val tweetLeftDataRecordCounter = + scopedStatsReceiver.counter("tweetLeftDataRecord") + lazy val tweetRightDataRecordCounter = + scopedStatsReceiver.counter("tweetRightDataRecord") + lazy val tweetMergeNumFeaturesCounter = + scopedStatsReceiver.counter("tweetMergeNumFeatures") + + @transient lazy val featureStoreClient: FeatureStoreClient = + FeatureStoreUtils.mkFeatureStoreClient( + serviceIdentifier = jobConfig.serviceIdentifier, + statsReceiver = scopedStatsReceiver + ) + + lazy val joinUserFeaturesDataRecordProducer = + if (jobConfig.keyedByUserEnabled) { + lazy val keyedByUserFeaturesStormService: Storm#Service[Set[UserId], DataRecord] = + Storm.service( + new UserFeaturesReadableStore( + featureStoreClient = featureStoreClient, + userEntity = User, + userFeaturesAdapter = UserFeaturesAdapter + ) + ) + + leftJoinDataRecordProducer( + keyFeature = SharedFeatures.USER_ID, + leftDataRecordProducer = underlyingProducer, + rightStormService = keyedByUserFeaturesStormService, + keyDataRecordCounter = keyDataRecordCounter, + keyFeatureCounter = keyFeatureCounter, + leftDataRecordCounter = leftDataRecordCounter, + rightDataRecordCounter = rightDataRecordCounter, + mergeNumFeaturesCounter = mergeNumFeaturesCounter + ) + } else { + underlyingProducer + } + + lazy val joinAuthorFeaturesDataRecordProducer = + if (jobConfig.keyedByAuthorEnabled) { + lazy val keyedByAuthorFeaturesStormService: Storm#Service[Set[AuthorId], DataRecord] = + Storm.service( + new UserFeaturesReadableStore( + featureStoreClient = featureStoreClient, + userEntity = Author, + userFeaturesAdapter = AuthorFeaturesAdapter + ) + ) + + leftJoinDataRecordProducer( + keyFeature = TimelinesSharedFeatures.SOURCE_AUTHOR_ID, + leftDataRecordProducer = joinUserFeaturesDataRecordProducer, + rightStormService = keyedByAuthorFeaturesStormService, + keyDataRecordCounter = authorKeyDataRecordCounter, + keyFeatureCounter = authorKeyFeatureCounter, + leftDataRecordCounter = authorLeftDataRecordCounter, + rightDataRecordCounter = authorRightDataRecordCounter, + mergeNumFeaturesCounter = authorMergeNumFeaturesCounter + ) + } else { + joinUserFeaturesDataRecordProducer + } + + lazy val joinTweetFeaturesDataRecordProducer = { + if (jobConfig.keyedByTweetEnabled) { + lazy val keyedByTweetFeaturesStormService: Storm#Service[Set[TweetId], DataRecord] = + Storm.service( + new TweetFeaturesReadableStore( + featureStoreClient = featureStoreClient, + tweetEntity = Tweet, + tweetFeaturesAdapter = TweetFeaturesAdapter + ) + ) + + leftJoinDataRecordProducer( + keyFeature = TimelinesSharedFeatures.SOURCE_TWEET_ID, + leftDataRecordProducer = joinAuthorFeaturesDataRecordProducer, + rightStormService = keyedByTweetFeaturesStormService, + keyDataRecordCounter = tweetKeyDataRecordCounter, + keyFeatureCounter = tweetKeyFeatureCounter, + leftDataRecordCounter = tweetLeftDataRecordCounter, + rightDataRecordCounter = tweetRightDataRecordCounter, + mergeNumFeaturesCounter = tweetMergeNumFeaturesCounter + ) + } else { + joinAuthorFeaturesDataRecordProducer + } + } + + joinTweetFeaturesDataRecordProducer + } + + private[this] lazy val DataRecordMerger = new DataRecordMerger + + /** + * Make join key from the client event data record and return both. + * @param keyFeature Feature to extract join key value: USER_ID, SOURCE_TWEET_ID, etc. + * @param record DataRecord containing client engagement and basic tweet-side features + * @return The return type is a tuple of this key and original data record which will be used + * in the subsequent leftJoin operation. + */ + private[this] def mkKey( + keyFeature: Feature[JLong], + record: DataRecord, + keyDataRecordCounter: Counter, + keyFeatureCounter: Counter + ): Set[Long] = { + keyDataRecordCounter.incr() + val richRecord = new RichDataRecord(record) + if (richRecord.hasFeature(keyFeature)) { + keyFeatureCounter.incr() + val key: Long = richRecord.getFeatureValue(keyFeature).toLong + Set(key) + } else { + Set.empty[Long] + } + } + + /** + * After the leftJoin, merge the client event data record and the joined data record + * into a single data record used for further aggregation. + */ + private[this] def mergeDataRecord( + leftRecord: Event[DataRecord], + rightRecordOpt: Option[DataRecord], + leftDataRecordCounter: Counter, + rightDataRecordCounter: Counter, + mergeNumFeaturesCounter: Counter + ): Event[DataRecord] = { + leftDataRecordCounter.incr() + rightRecordOpt.foreach { rightRecord => + rightDataRecordCounter.incr() + DataRecordMerger.merge(leftRecord.event, rightRecord) + mergeNumFeaturesCounter.incr(new RichDataRecord(leftRecord.event).numFeatures()) + } + leftRecord + } + + private[this] def leftJoinDataRecordProducer( + keyFeature: Feature[JLong], + leftDataRecordProducer: Producer[Storm, Event[DataRecord]], + rightStormService: Storm#Service[Set[Long], DataRecord], + keyDataRecordCounter: => Counter, + keyFeatureCounter: => Counter, + leftDataRecordCounter: => Counter, + rightDataRecordCounter: => Counter, + mergeNumFeaturesCounter: => Counter + ): Producer[Storm, Event[DataRecord]] = { + val keyedLeftDataRecordProducer: Producer[Storm, (Set[Long], Event[DataRecord])] = + leftDataRecordProducer.map { + case dataRecord: HomeEvent[DataRecord] => + val key = mkKey( + keyFeature = keyFeature, + record = dataRecord.event, + keyDataRecordCounter = keyDataRecordCounter, + keyFeatureCounter = keyFeatureCounter + ) + (key, dataRecord) + case dataRecord: ProfileEvent[DataRecord] => + val key = Set.empty[Long] + (key, dataRecord) + case dataRecord: SearchEvent[DataRecord] => + val key = Set.empty[Long] + (key, dataRecord) + case dataRecord: UuaEvent[DataRecord] => + val key = Set.empty[Long] + (key, dataRecord) + } + + keyedLeftDataRecordProducer + .leftJoin(rightStormService) + .map { + case (_, (leftRecord, rightRecordOpt)) => + mergeDataRecord( + leftRecord = leftRecord, + rightRecordOpt = rightRecordOpt, + leftDataRecordCounter = leftDataRecordCounter, + rightDataRecordCounter = rightDataRecordCounter, + mergeNumFeaturesCounter = mergeNumFeaturesCounter + ) + } + } + + /** + * Filter Unified User Actions events to include only actions that has home timeline visit prior to landing on the page + */ + def isUuaBCEEventsFromHome(event: UnifiedUserAction): Boolean = { + def breadcrumbViewsContain(view: String): Boolean = + event.eventMetadata.breadcrumbViews.map(_.contains(view)).getOrElse(false) + + (event.actionType) match { + case ActionType.ClientTweetV2Impression if breadcrumbViewsContain("home") => + true + case ActionType.ClientTweetVideoFullscreenV2Impression + if (breadcrumbViewsContain("home") & breadcrumbViewsContain("video")) => + true + case ActionType.ClientProfileV2Impression if breadcrumbViewsContain("home") => + true + case _ => false + } + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfig.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfig.scala new file mode 100644 index 000000000..8d7a41d21 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfig.scala @@ -0,0 +1,34 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.conversions.DurationOps._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.{ + OnlineAggregationStoresTrait, + RealTimeAggregateStore +} + +object TimelinesOnlineAggregationConfig + extends TimelinesOnlineAggregationDefinitionsTrait + with OnlineAggregationStoresTrait { + + import TimelinesOnlineAggregationSources._ + + override lazy val ProductionStore = RealTimeAggregateStore( + memcacheDataSet = "timelines_real_time_aggregates", + isProd = true, + cacheTTL = 5.days + ) + + override lazy val StagingStore = RealTimeAggregateStore( + memcacheDataSet = "twemcache_timelines_real_time_aggregates", + isProd = false, + cacheTTL = 5.days + ) + + override lazy val inputSource = timelinesOnlineAggregateSource + + /** + * AggregateToCompute: This defines the complete set of aggregates to be + * computed by the aggregation job and to be stored in memcache. + */ + override lazy val AggregatesToCompute = ProdAggregates ++ StagingAggregates +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfigBase.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfigBase.scala new file mode 100644 index 000000000..0d7c072e2 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationConfigBase.scala @@ -0,0 +1,1112 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.conversions.DurationOps._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateGroup +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateSource +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateStore +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.OnlineAggregationConfigTrait +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.CountMetric +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.SumMetric +import com.twitter.timelines.data_processing.ml_util.transforms.BinaryUnion +import com.twitter.timelines.data_processing.ml_util.transforms.DownsampleTransform +import com.twitter.timelines.data_processing.ml_util.transforms.IsNewUserTransform +import com.twitter.timelines.data_processing.ml_util.transforms.IsPositionTransform +import com.twitter.timelines.data_processing.ml_util.transforms.LogTransform +import com.twitter.timelines.data_processing.ml_util.transforms.PositionCase +import com.twitter.timelines.data_processing.ml_util.transforms.RichITransform +import com.twitter.timelines.data_processing.ml_util.transforms.RichRemoveUnverifiedUserTransform +import com.twitter.timelines.prediction.features.client_log_event.ClientLogEventDataRecordFeatures +import com.twitter.timelines.prediction.features.common.CombinedFeatures +import com.twitter.timelines.prediction.features.common.CombinedFeatures._ +import com.twitter.timelines.prediction.features.common.ProfileLabelFeatures +import com.twitter.timelines.prediction.features.common.SearchLabelFeatures +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures.IS_TOP_FIVE +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures.IS_TOP_ONE +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures.IS_TOP_TEN +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures.LOG_POSITION +import com.twitter.timelines.prediction.features.list_features.ListFeatures +import com.twitter.timelines.prediction.features.recap.RecapFeatures +import com.twitter.util.Duration +import java.lang.{Boolean => JBoolean} +import java.lang.{Long => JLong} +import scala.io.Source + +object TimelinesOnlineAggregationUtils { + val TweetLabels: Set[Feature[JBoolean]] = CombinedFeatures.EngagementsRealTime + val TweetCoreLabels: Set[Feature[JBoolean]] = CombinedFeatures.CoreEngagements + val TweetDwellLabels: Set[Feature[JBoolean]] = CombinedFeatures.DwellEngagements + val TweetCoreAndDwellLabels: Set[Feature[JBoolean]] = TweetCoreLabels ++ TweetDwellLabels + val PrivateEngagementLabelsV2: Set[Feature[JBoolean]] = CombinedFeatures.PrivateEngagementsV2 + val ProfileCoreLabels: Set[Feature[JBoolean]] = ProfileLabelFeatures.CoreEngagements + val ProfileNegativeEngagementLabels: Set[Feature[JBoolean]] = + ProfileLabelFeatures.NegativeEngagements + val ProfileNegativeEngagementUnionLabels: Set[Feature[JBoolean]] = Set( + ProfileLabelFeatures.IS_NEGATIVE_FEEDBACK_UNION) + val SearchCoreLabels: Set[Feature[JBoolean]] = SearchLabelFeatures.CoreEngagements + val TweetNegativeEngagementLabels: Set[Feature[JBoolean]] = + CombinedFeatures.NegativeEngagementsRealTime + val TweetNegativeEngagementDontLikeLabels: Set[Feature[JBoolean]] = + CombinedFeatures.NegativeEngagementsRealTimeDontLike + val TweetNegativeEngagementSecondaryLabels: Set[Feature[JBoolean]] = + CombinedFeatures.NegativeEngagementsSecondary + val AllTweetNegativeEngagementLabels: Set[Feature[JBoolean]] = + TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels ++ TweetNegativeEngagementSecondaryLabels + val UserAuthorEngagementLabels: Set[Feature[JBoolean]] = CombinedFeatures.UserAuthorEngagements + val ShareEngagementLabels: Set[Feature[JBoolean]] = CombinedFeatures.ShareEngagements + val BookmarkEngagementLabels: Set[Feature[JBoolean]] = CombinedFeatures.BookmarkEngagements + val AllBCEDwellLabels: Set[Feature[JBoolean]] = + CombinedFeatures.TweetDetailDwellEngagements ++ CombinedFeatures.ProfileDwellEngagements ++ CombinedFeatures.FullscreenVideoDwellEngagements + val AllTweetUnionLabels: Set[Feature[JBoolean]] = Set( + CombinedFeatures.IS_IMPLICIT_POSITIVE_FEEDBACK_UNION, + CombinedFeatures.IS_EXPLICIT_POSITIVE_FEEDBACK_UNION, + CombinedFeatures.IS_ALL_NEGATIVE_FEEDBACK_UNION + ) + val AllTweetLabels: Set[Feature[JBoolean]] = + TweetLabels ++ TweetCoreAndDwellLabels ++ AllTweetNegativeEngagementLabels ++ ProfileCoreLabels ++ ProfileNegativeEngagementLabels ++ ProfileNegativeEngagementUnionLabels ++ UserAuthorEngagementLabels ++ SearchCoreLabels ++ ShareEngagementLabels ++ BookmarkEngagementLabels ++ PrivateEngagementLabelsV2 ++ AllBCEDwellLabels ++ AllTweetUnionLabels + + def addFeatureFilterFromResource( + prodGroup: AggregateGroup, + aggRemovalPath: String + ): AggregateGroup = { + val resource = Some(Source.fromResource(aggRemovalPath)) + val lines = resource.map(_.getLines.toSeq) + lines match { + case Some(value) => prodGroup.copy(aggExclusionRegex = value) + case _ => prodGroup + } + } +} + +trait TimelinesOnlineAggregationDefinitionsTrait extends OnlineAggregationConfigTrait { + import TimelinesOnlineAggregationUtils._ + + def inputSource: AggregateSource + def ProductionStore: AggregateStore + def StagingStore: AggregateStore + + val TweetFeatures: Set[Feature[_]] = Set( + ClientLogEventDataRecordFeatures.HasConsumerVideo, + ClientLogEventDataRecordFeatures.PhotoCount + ) + val CandidateTweetSourceFeatures: Set[Feature[_]] = Set( + ClientLogEventDataRecordFeatures.FromRecap, + ClientLogEventDataRecordFeatures.FromRecycled, + ClientLogEventDataRecordFeatures.FromActivity, + ClientLogEventDataRecordFeatures.FromSimcluster, + ClientLogEventDataRecordFeatures.FromErg, + ClientLogEventDataRecordFeatures.FromCroon, + ClientLogEventDataRecordFeatures.FromList, + ClientLogEventDataRecordFeatures.FromRecTopic + ) + + def createStagingGroup(prodGroup: AggregateGroup): AggregateGroup = + prodGroup.copy( + outputStore = StagingStore + ) + + // Aggregate user engagements/features by tweet Id. + val tweetEngagement30MinuteCountsProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate user engagements/features by tweet Id. + val tweetVerifiedDontLikeEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v6", + preTransforms = Seq(RichRemoveUnverifiedUserTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val tweetNegativeEngagement6HourCounts = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v2", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val tweetVerifiedNegativeEngagementCounts = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v7", + preTransforms = Seq(RichRemoveUnverifiedUserTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val promotedTweetEngagementRealTimeCounts = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v3.is_promoted", + preTransforms = Seq( + DownsampleTransform( + negativeSamplingRate = 0.0, + keepLabels = Set(ClientLogEventDataRecordFeatures.IsPromoted))), + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(2.hours, 24.hours), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate total engagement counts by tweet Id for non-public + * engagements. Similar to EB's public engagement counts. + */ + val tweetEngagementTotalCountsProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val tweetNegativeEngagementTotalCounts = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v2", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = TweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's user id. + */ + val userEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v1", + keys = Set(SharedFeatures.USER_ID), + features = TweetFeatures, + labels = TweetLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's user id. + */ + val userEngagementRealTimeAggregatesV2 = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v2", + keys = Set(SharedFeatures.USER_ID), + features = ClientLogEventDataRecordFeatures.TweetFeaturesV2, + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate author's user state features grouped by viewer's user id. + */ + val userEngagementAuthorUserStateRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v3", + preTransforms = Seq.empty, + keys = Set(SharedFeatures.USER_ID), + features = AuthorFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate author's user state features grouped by viewer's user id. + */ + val userNegativeEngagementAuthorUserStateRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v4", + preTransforms = Seq.empty, + keys = Set(SharedFeatures.USER_ID), + features = AuthorFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's user id, with 48 hour halfLife. + */ + val userEngagement48HourRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v5", + keys = Set(SharedFeatures.USER_ID), + features = TweetFeatures, + labels = TweetLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(48.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate author's user state features grouped by viewer's user id. + */ + val userNegativeEngagementAuthorUserState72HourRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_aggregates_v6", + preTransforms = Seq.empty, + keys = Set(SharedFeatures.USER_ID), + features = AuthorFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(72.hours), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate features grouped by source author id: for each author, aggregate features are created + * to quantify engagements (fav, reply, etc.) which tweets of the author has received. + */ + val authorEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = TweetLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate features grouped by source author id: for each author, aggregate features are created + * to quantify negative engagements (mute, block, etc.) which tweets of the author has received. + * + * This aggregate group is not used in Home, but it is used in Follow Recommendation Service so need to keep it for now. + * + */ + val authorNegativeEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_aggregates_v2", + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = TweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate features grouped by source author id: for each author, aggregate features are created + * to quantify negative engagements (don't like) which tweets of the author has received from + * verified users. + */ + val authorVerifiedNegativeEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_aggregates_v3", + preTransforms = Seq(RichRemoveUnverifiedUserTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by topic id. + */ + val topicEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_aggregates_v1", + keys = Set(TimelinesSharedFeatures.TOPIC_ID), + features = Set.empty, + labels = TweetLabels ++ AllTweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate user engagements / user state by topic id. + */ + val topicEngagementUserStateRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_aggregates_v2", + keys = Set(TimelinesSharedFeatures.TOPIC_ID), + features = UserFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate user negative engagements / user state by topic id. + */ + val topicNegativeEngagementUserStateRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_aggregates_v3", + keys = Set(TimelinesSharedFeatures.TOPIC_ID), + features = UserFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by topic id like real_time_topic_aggregates_v1 but 24hour halfLife + */ + val topicEngagement24HourRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_aggregates_v4", + keys = Set(TimelinesSharedFeatures.TOPIC_ID), + features = Set.empty, + labels = TweetLabels ++ AllTweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate user engagements / user state by tweet Id. + val tweetEngagementUserStateRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v3", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = UserFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate user engagements / user gender by tweet Id. + val tweetEngagementGenderRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v4", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = UserFeaturesAdapter.GenderBooleanFeatures, + labels = + TweetCoreAndDwellLabels ++ TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate user negative engagements / user state by tweet Id. + val tweetNegativeEngagementUserStateRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v5", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = UserFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate user negative engagements / user state by tweet Id. + val tweetVerifiedNegativeEngagementUserStateRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_aggregates_v8", + preTransforms = Seq(RichRemoveUnverifiedUserTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = UserFeaturesAdapter.UserStateBooleanFeatures, + labels = TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet engagement labels and candidate tweet source features grouped by user id. + */ + val userCandidateTweetSourceEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_candidate_tweet_source_aggregates_v1", + keys = Set(SharedFeatures.USER_ID), + features = CandidateTweetSourceFeatures, + labels = TweetCoreAndDwellLabels ++ NegativeEngagementsRealTimeDontLike, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet engagement labels and candidate tweet source features grouped by user id. + */ + val userCandidateTweetSourceEngagement48HourRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_candidate_tweet_source_aggregates_v2", + keys = Set(SharedFeatures.USER_ID), + features = CandidateTweetSourceFeatures, + labels = TweetCoreAndDwellLabels ++ NegativeEngagementsRealTimeDontLike, + metrics = Set(CountMetric), + halfLives = Set(48.hours), + outputStore = ProductionStore, + includeAnyFeature = false, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's user id on Profile engagements + */ + val userProfileEngagementRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "profile_real_time_user_aggregates_v1", + preTransforms = Seq(IsNewUserTransform), + keys = Set(SharedFeatures.USER_ID), + features = TweetFeatures, + labels = ProfileCoreLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val NegativeEngagementsUnionTransform = RichITransform( + BinaryUnion( + featuresToUnify = ProfileNegativeEngagementLabels, + outputFeature = ProfileLabelFeatures.IS_NEGATIVE_FEEDBACK_UNION + )) + + /** + * Aggregate tweet features grouped by viewer's user id on Profile negative engagements. + */ + val userProfileNegativeEngagementRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "profile_negative_engagement_real_time_user_aggregates_v1", + preTransforms = Seq(NegativeEngagementsUnionTransform), + keys = Set(SharedFeatures.USER_ID), + features = Set.empty, + labels = ProfileNegativeEngagementLabels ++ ProfileNegativeEngagementUnionLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 72.hours, 14.day), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's and author's user ids and on Profile engagements + */ + val userAuthorProfileEngagementRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "user_author_profile_real_time_aggregates_v1", + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = ProfileCoreLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours, 72.hours), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate tweet features grouped by viewer's and author's user ids and on negative Profile engagements + */ + val userAuthorProfileNegativeEngagementRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "user_author_profile_negative_engagement_real_time_aggregates_v1", + preTransforms = Seq(NegativeEngagementsUnionTransform), + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = ProfileNegativeEngagementUnionLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 72.hours, 14.day), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val newUserAuthorEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_new_user_author_aggregates_v1", + preTransforms = Seq(IsNewUserTransform), + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = TweetCoreAndDwellLabels ++ Set( + IS_CLICKED, + IS_PROFILE_CLICKED, + IS_PHOTO_EXPANDED + ), + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val userAuthorEngagementRealTimeAggregatesProd = { + // Computing user-author real-time aggregates is very expensive so we + // take the union of all major negative feedback engagements to create + // a single negtive label for aggregation. We also include a number of + // core positive engagements. + val BinaryUnionNegativeEngagements = + BinaryUnion( + featuresToUnify = AllTweetNegativeEngagementLabels, + outputFeature = IS_NEGATIVE_FEEDBACK_UNION + ) + val BinaryUnionNegativeEngagementsTransform = RichITransform(BinaryUnionNegativeEngagements) + + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_author_aggregates_v1", + preTransforms = Seq(BinaryUnionNegativeEngagementsTransform), + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = UserAuthorEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 1.day), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + } + + /** + * Aggregate tweet features grouped by list id. + */ + val listEngagementRealTimeAggregatesProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_list_aggregates_v1", + keys = Set(ListFeatures.LIST_ID), + features = Set.empty, + labels = + TweetCoreAndDwellLabels ++ TweetNegativeEngagementLabels ++ TweetNegativeEngagementDontLikeLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate features grouped by topic of tweet and country from user's location + val topicCountryRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_country_aggregates_v1", + keys = Set(TimelinesSharedFeatures.TOPIC_ID, UserFeaturesAdapter.USER_COUNTRY_ID), + features = Set.empty, + labels = + TweetCoreAndDwellLabels ++ AllTweetNegativeEngagementLabels ++ PrivateEngagementLabelsV2 ++ ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 72.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate features grouped by TweetId_Country from user's location + val tweetCountryRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_country_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID, UserFeaturesAdapter.USER_COUNTRY_ID), + features = Set.empty, + labels = TweetCoreAndDwellLabels ++ AllTweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = true, + includeTimestampFeature = false, + ) + + // Additional aggregate features grouped by TweetId_Country from user's location + val tweetCountryPrivateEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_country_aggregates_v2", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID, UserFeaturesAdapter.USER_COUNTRY_ID), + features = Set.empty, + labels = PrivateEngagementLabelsV2 ++ ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 72.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Aggregate features grouped by TweetId_Country from user's location + val tweetCountryVerifiedNegativeEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_country_aggregates_v3", + preTransforms = Seq(RichRemoveUnverifiedUserTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID, UserFeaturesAdapter.USER_COUNTRY_ID), + features = Set.empty, + labels = AllTweetNegativeEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, Duration.Top), + outputStore = ProductionStore, + includeAnyLabel = true, + includeTimestampFeature = false, + ) + + object positionTranforms extends IsPositionTransform { + override val isInPositionRangeFeature: Seq[PositionCase] = + Seq(PositionCase(1, IS_TOP_ONE), PositionCase(5, IS_TOP_FIVE), PositionCase(10, IS_TOP_TEN)) + override val decodedPositionFeature: Feature.Discrete = + ClientLogEventDataRecordFeatures.InjectedPosition + } + + val userPositionEngagementsCountsProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_position_based_user_aggregates_v1", + keys = Set(SharedFeatures.USER_ID), + features = Set(IS_TOP_ONE, IS_TOP_FIVE, IS_TOP_TEN), + labels = TweetCoreAndDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + preTransforms = Seq(positionTranforms), + includeAnyLabel = false, + includeAnyFeature = false, + includeTimestampFeature = false, + ) + + val userPositionEngagementsSumProd = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_position_based_user_sum_aggregates_v2", + keys = Set(SharedFeatures.USER_ID), + features = Set(LOG_POSITION), + labels = TweetCoreAndDwellLabels, + metrics = Set(SumMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + preTransforms = + Seq(new LogTransform(ClientLogEventDataRecordFeatures.InjectedPosition, LOG_POSITION)), + includeAnyLabel = false, + includeAnyFeature = false, + includeTimestampFeature = false, + ) + + // Aggregates for share engagements + val tweetShareEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_share_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val userShareEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_share_aggregates_v1", + keys = Set(SharedFeatures.USER_ID), + features = Set.empty, + labels = ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val userAuthorShareEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_author_share_aggregates_v1", + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val topicShareEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_topic_share_aggregates_v1", + keys = Set(TimelinesSharedFeatures.TOPIC_ID), + features = Set.empty, + labels = ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val authorShareEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_share_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = ShareEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + // Bookmark RTAs + val tweetBookmarkEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_bookmark_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = BookmarkEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val userBookmarkEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_bookmark_aggregates_v1", + keys = Set(SharedFeatures.USER_ID), + features = Set.empty, + labels = BookmarkEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val userAuthorBookmarkEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_author_bookmark_aggregates_v1", + keys = Set(SharedFeatures.USER_ID, TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = BookmarkEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyFeature = true, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val authorBookmarkEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_bookmark_aggregates_v1", + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = Set.empty, + labels = BookmarkEngagementLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate on user level dwell labels from BCE + */ + val userBCEDwellEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_user_bce_dwell_aggregates", + keys = Set(SharedFeatures.USER_ID), + features = Set.empty, + labels = AllBCEDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + /** + * Aggregate on tweet level dwell labels from BCE + */ + val tweetBCEDwellEngagementsRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_tweet_bce_dwell_aggregates", + keys = Set(TimelinesSharedFeatures.SOURCE_TWEET_ID), + features = Set.empty, + labels = AllBCEDwellLabels, + metrics = Set(CountMetric), + halfLives = Set(30.minutes, 24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeTimestampFeature = false, + ) + + val ImplicitPositiveEngagementsUnionTransform = RichITransform( + BinaryUnion( + featuresToUnify = CombinedFeatures.ImplicitPositiveEngagements, + outputFeature = CombinedFeatures.IS_IMPLICIT_POSITIVE_FEEDBACK_UNION + ) + ) + + val ExplicitPositiveEngagementsUnionTransform = RichITransform( + BinaryUnion( + featuresToUnify = CombinedFeatures.ExplicitPositiveEngagements, + outputFeature = CombinedFeatures.IS_EXPLICIT_POSITIVE_FEEDBACK_UNION + ) + ) + + val AllNegativeEngagementsUnionTransform = RichITransform( + BinaryUnion( + featuresToUnify = CombinedFeatures.AllNegativeEngagements, + outputFeature = CombinedFeatures.IS_ALL_NEGATIVE_FEEDBACK_UNION + ) + ) + + /** + * Aggregate features for author content preference + */ + val authorContentPreferenceRealTimeAggregates = + AggregateGroup( + inputSource = inputSource, + aggregatePrefix = "real_time_author_content_preference_aggregates", + preTransforms = Seq( + ImplicitPositiveEngagementsUnionTransform, + ExplicitPositiveEngagementsUnionTransform, + AllNegativeEngagementsUnionTransform), + keys = Set(TimelinesSharedFeatures.SOURCE_AUTHOR_ID), + features = + ClientLogEventDataRecordFeatures.AuthorContentPreferenceTweetTypeFeatures ++ AuthorFeaturesAdapter.UserStateBooleanFeatures, + labels = AllTweetUnionLabels, + metrics = Set(CountMetric), + halfLives = Set(24.hours), + outputStore = ProductionStore, + includeAnyLabel = false, + includeAnyFeature = false, + ) + + val FeaturesGeneratedByPreTransforms = Set(LOG_POSITION, IS_TOP_TEN, IS_TOP_FIVE, IS_TOP_ONE) + + val ProdAggregateGroups = Set( + tweetEngagement30MinuteCountsProd, + tweetEngagementTotalCountsProd, + tweetNegativeEngagement6HourCounts, + tweetNegativeEngagementTotalCounts, + userEngagementRealTimeAggregatesProd, + userEngagement48HourRealTimeAggregatesProd, + userNegativeEngagementAuthorUserStateRealTimeAggregates, + userNegativeEngagementAuthorUserState72HourRealTimeAggregates, + authorEngagementRealTimeAggregatesProd, + topicEngagementRealTimeAggregatesProd, + topicEngagement24HourRealTimeAggregatesProd, + tweetEngagementUserStateRealTimeAggregatesProd, + tweetNegativeEngagementUserStateRealTimeAggregates, + userProfileEngagementRealTimeAggregates, + newUserAuthorEngagementRealTimeAggregatesProd, + userAuthorEngagementRealTimeAggregatesProd, + listEngagementRealTimeAggregatesProd, + tweetCountryRealTimeAggregates, + tweetShareEngagementsRealTimeAggregates, + userShareEngagementsRealTimeAggregates, + userAuthorShareEngagementsRealTimeAggregates, + topicShareEngagementsRealTimeAggregates, + authorShareEngagementsRealTimeAggregates, + tweetBookmarkEngagementsRealTimeAggregates, + userBookmarkEngagementsRealTimeAggregates, + userAuthorBookmarkEngagementsRealTimeAggregates, + authorBookmarkEngagementsRealTimeAggregates, + topicCountryRealTimeAggregates, + tweetCountryPrivateEngagementsRealTimeAggregates, + userBCEDwellEngagementsRealTimeAggregates, + tweetBCEDwellEngagementsRealTimeAggregates, + authorContentPreferenceRealTimeAggregates, + authorVerifiedNegativeEngagementRealTimeAggregatesProd, + tweetVerifiedDontLikeEngagementRealTimeAggregatesProd, + tweetVerifiedNegativeEngagementCounts, + tweetVerifiedNegativeEngagementUserStateRealTimeAggregates, + tweetCountryVerifiedNegativeEngagementsRealTimeAggregates + ).map( + addFeatureFilterFromResource( + _, + "com/twitter/timelines/prediction/common/aggregates/real_time/aggregates_to_drop.txt")) + + val StagingAggregateGroups = ProdAggregateGroups.map(createStagingGroup) + + /** + * Contains the fully typed aggregate groups from which important + * values can be derived e.g. the features to be computed, halflives etc. + */ + override val ProdAggregates = ProdAggregateGroups.flatMap(_.buildTypedAggregateGroups()) + + override val StagingAggregates = StagingAggregateGroups.flatMap(_.buildTypedAggregateGroups()) + + + override val ProdCommonAggregates = ProdAggregates + .filter(_.keysToAggregate == Set(SharedFeatures.USER_ID)) + + /** + * This defines the set of selected features from a candidate + * that we'd like to send to the served features cache by TLM. + * These should include interesting and necessary features that + * cannot be extracted from LogEvents only by the real-time aggregates + * job. If you are adding new AggregateGroups requiring TLM-side + * candidate features, make sure to add them here. + */ + val candidateFeaturesToCache: Set[Feature[_]] = Set( + TimelinesSharedFeatures.SOURCE_AUTHOR_ID, + RecapFeatures.HASHTAGS, + RecapFeatures.MENTIONED_SCREEN_NAMES, + RecapFeatures.URL_DOMAINS + ) +} + +/** + * This config should only be used to access the aggregate features constructed by the + * aggregation config, and not for implementing an online real-time aggregates job. + */ +object TimelinesOnlineAggregationFeaturesOnlyConfig + extends TimelinesOnlineAggregationDefinitionsTrait { + + private[real_time] case class DummyAggregateSource(name: String, timestampFeature: Feature[JLong]) + extends AggregateSource + + private[real_time] case class DummyAggregateStore(name: String) extends AggregateStore + + override lazy val inputSource = DummyAggregateSource( + name = "timelines_rta", + timestampFeature = SharedFeatures.TIMESTAMP + ) + override lazy val ProductionStore = DummyAggregateStore("timelines_rta") + override lazy val StagingStore = DummyAggregateStore("timelines_rta") + + override lazy val AggregatesToCompute = ProdAggregates ++ StagingAggregates +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationSources.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationSources.scala new file mode 100644 index 000000000..71e97a1b1 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesOnlineAggregationSources.scala @@ -0,0 +1,5 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +object TimelinesOnlineAggregationSources { + val timelinesOnlineAggregateSource = new TimelinesStormAggregateSource +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesRealTimeAggregatesJob.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesRealTimeAggregatesJob.scala new file mode 100644 index 000000000..e386d4da1 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesRealTimeAggregatesJob.scala @@ -0,0 +1,182 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.DefaultStatsReceiver +import com.twitter.summingbird.Options +import com.twitter.summingbird.online.option.FlatMapParallelism +import com.twitter.summingbird.online.option.SourceParallelism +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron._ +import com.twitter.timelines.data_processing.ml_util.transforms.DownsampleTransform +import com.twitter.timelines.data_processing.ml_util.transforms.RichITransform +import com.twitter.timelines.data_processing.ml_util.transforms.UserDownsampleTransform + +import com.twitter.timelines.prediction.common.aggregates.BCELabelTransformFromUUADataRecord + +/** + * Sets up relevant topology parameters. Our primary goal is to handle the + * LogEvent stream and aggregate (sum) on the parsed DataRecords without falling + * behind. Our constraint is the resulting write (and read) QPS to the backing + * memcache store. + * + * If the job is falling behind, add more flatMappers and/or Summers after + * inspecting the viz panels for the respective job (go/heron-ui). An increase in + * Summers (and/or aggregation keys and features in the config) results in an + * increase in memcache QPS (go/cb and search for our cache). Adjust with CacheSize + * settings until QPS is well-controlled. + * + */ +object TimelinesRealTimeAggregatesJobConfigs extends RealTimeAggregatesJobConfigs { + import TimelinesOnlineAggregationUtils._ + + /** + * We remove input records that do not contain a label/engagement as defined in AllTweetLabels, which includes + * explicit user engagements including public, private and impression events. By avoiding ingesting records without + * engagemnts, we guarantee that no distribution shifts occur in computed aggregate features when we add a new spout + * to input aggregate sources. Counterfactual signal is still available since we aggregate on explicit dwell + * engagements. + */ + val NegativeDownsampleTransform = + DownsampleTransform( + negativeSamplingRate = 0.0, + keepLabels = AllTweetLabels, + positiveSamplingRate = 1.0) + + /** + * We downsample positive engagements for devel topology to reduce traffic, aiming for equivalent of 10% of prod traffic. + * First apply consistent downsampling to 10% of users, and then apply downsampling to remove records without + * explicit labels. We apply user-consistent sampling to more closely approximate prod query patterns. + */ + val StagingUserBasedDownsampleTransform = + UserDownsampleTransform( + availability = 1000, + featureName = "rta_devel" + ) + + override val Prod = RealTimeAggregatesJobConfig( + appId = "summingbird_timelines_rta", + topologyWorkers = 1450, + sourceCount = 120, + flatMapCount = 1800, + summerCount = 3850, + cacheSize = 200, + containerRamGigaBytes = 54, + name = "timelines_real_time_aggregates", + teamName = "timelines", + teamEmail = "", + // If one component is hitting GC limit at prod, tune componentToMetaSpaceSizeMap. + // Except for Source bolts. Tune componentToRamGigaBytesMap for Source bolts instead. + componentToMetaSpaceSizeMap = Map( + "Tail-FlatMap" -> "-XX:MaxMetaspaceSize=1024M -XX:MetaspaceSize=1024M", + "Tail" -> "-XX:MaxMetaspaceSize=2560M -XX:MetaspaceSize=2560M" + ), + // If either component is hitting memory limit at prod + // its memory need to increase: either increase total memory of container (containerRamGigaBytes), + // or allocate more memory for one component while keeping total memory unchanged. + componentToRamGigaBytesMap = Map( + "Tail-FlatMap-Source" -> 3, // Home source + "Tail-FlatMap-Source.2" -> 3, // Profile source + "Tail-FlatMap-Source.3" -> 3, // Search source + "Tail-FlatMap-Source.4" -> 3, // UUA source + "Tail-FlatMap" -> 8 + // Tail will use the leftover memory in the container. + // Make sure to tune topologyWorkers and containerRamGigaBytes such that this is greater than 10 GB. + ), + topologyNamedOptions = Map( + "TL_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(120)), + "PROFILE_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(30)), + "SEARCH_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(10)), + "UUA_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(10)), + "COMBINED_PRODUCER" -> Options() + .set(FlatMapParallelism(1800)) + ), + // The UUA datarecord for BCE events inputted will not have binary labels populated. + // BCELabelTransform will set the datarecord with binary BCE dwell labels features based on the corresponding dwell_time_ms. + // It's important to have the BCELabelTransformFromUUADataRecord before ProdNegativeDownsampleTransform + // because ProdNegativeDownsampleTransform will remove datarecord that contains no features from AllTweetLabels. + onlinePreTransforms = + Seq(RichITransform(BCELabelTransformFromUUADataRecord), NegativeDownsampleTransform) + ) + + /** + * we downsample 10% computation of devel RTA based on [[StagingNegativeDownsampleTransform]]. + * To better test scalability of topology, we reduce computing resource of components "Tail-FlatMap" + * and "Tail" to be 10% of prod but keep computing resource of component "Tail-FlatMap-Source" unchanged. + * hence flatMapCount=110, summerCount=105 and sourceCount=100. Hence topologyWorkers =(110+105+100)/5 = 63. + */ + override val Devel = RealTimeAggregatesJobConfig( + appId = "summingbird_timelines_rta_devel", + topologyWorkers = 120, + sourceCount = 120, + flatMapCount = 150, + summerCount = 300, + cacheSize = 200, + containerRamGigaBytes = 54, + name = "timelines_real_time_aggregates_devel", + teamName = "timelines", + teamEmail = "", + // If one component is hitting GC limit at prod, tune componentToMetaSpaceSizeMap + // Except for Source bolts. Tune componentToRamGigaBytesMap for Source bolts instead. + componentToMetaSpaceSizeMap = Map( + "Tail-FlatMap" -> "-XX:MaxMetaspaceSize=1024M -XX:MetaspaceSize=1024M", + "Tail" -> "-XX:MaxMetaspaceSize=2560M -XX:MetaspaceSize=2560M" + ), + // If either component is hitting memory limit at prod + // its memory need to increase: either increase total memory of container (containerRamGigaBytes), + // or allocate more memory for one component while keeping total memory unchanged. + componentToRamGigaBytesMap = Map( + "Tail-FlatMap-Source" -> 3, // Home source + "Tail-FlatMap-Source.2" -> 3, // Profile source + "Tail-FlatMap-Source.3" -> 3, // Search source + "Tail-FlatMap-Source.4" -> 3, // UUA source + "Tail-FlatMap" -> 8 + // Tail will use the leftover memory in the container. + // Make sure to tune topologyWorkers and containerRamGigaBytes such that this is greater than 10 GB. + ), + topologyNamedOptions = Map( + "TL_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(120)), + "PROFILE_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(30)), + "SEARCH_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(10)), + "UUA_EVENTS_SOURCE" -> Options() + .set(SourceParallelism(10)), + "COMBINED_PRODUCER" -> Options() + .set(FlatMapParallelism(150)) + ), + // It's important to have the BCELabelTransformFromUUADataRecord before ProdNegativeDownsampleTransform + onlinePreTransforms = Seq( + StagingUserBasedDownsampleTransform, + RichITransform(BCELabelTransformFromUUADataRecord), + NegativeDownsampleTransform), + enableUserReindexingNighthawkBtreeStore = true, + enableUserReindexingNighthawkHashStore = true, + userReindexingNighthawkBtreeStoreConfig = NighthawkUnderlyingStoreConfig( + serversetPath = + "/twitter/service/cache-user/test/nighthawk_timelines_real_time_aggregates_btree_test_api", + // NOTE: table names are prefixed to every pkey so keep it short + tableName = "u_r_v1", // (u)ser_(r)eindexing_v1 + // keep ttl <= 1 day because it's keyed on user, and we will have limited hit rates beyond 1 day + cacheTTL = 1.day + ), + userReindexingNighthawkHashStoreConfig = NighthawkUnderlyingStoreConfig( + // For prod: "/s/cache-user/nighthawk_timelines_real_time_aggregates_hash_api", + serversetPath = + "/twitter/service/cache-user/test/nighthawk_timelines_real_time_aggregates_hash_test_api", + // NOTE: table names are prefixed to every pkey so keep it short + tableName = "u_r_v1", // (u)ser_(r)eindexing_v1 + // keep ttl <= 1 day because it's keyed on user, and we will have limited hit rates beyond 1 day + cacheTTL = 1.day + ) + ) +} + +object TimelinesRealTimeAggregatesJob extends RealTimeAggregatesJobBase { + override lazy val statsReceiver = DefaultStatsReceiver.scope("timelines_real_time_aggregates") + override lazy val jobConfigs = TimelinesRealTimeAggregatesJobConfigs + override lazy val aggregatesToCompute = TimelinesOnlineAggregationConfig.AggregatesToCompute +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesStormAggregateSource.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesStormAggregateSource.scala new file mode 100644 index 000000000..2e096dc07 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TimelinesStormAggregateSource.scala @@ -0,0 +1,185 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.snowflake.id.SnowflakeId +import com.twitter.summingbird._ +import com.twitter.summingbird.storm.Storm +import com.twitter.summingbird_internal.sources.AppId +import com.twitter.summingbird_internal.sources.storm.remote.ClientEventSourceScrooge2 +import com.twitter.timelines.data_processing.ad_hoc.suggests.common.AllScribeProcessor +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.RealTimeAggregatesJobConfig +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.StormAggregateSource +import com.twitter.timelines.prediction.adapters.client_log_event.ClientLogEventAdapter +import com.twitter.timelines.prediction.adapters.client_log_event.ProfileClientLogEventAdapter +import com.twitter.timelines.prediction.adapters.client_log_event.SearchClientLogEventAdapter +import com.twitter.timelines.prediction.adapters.client_log_event.UuaEventAdapter +import com.twitter.unified_user_actions.client.config.KafkaConfigs +import com.twitter.unified_user_actions.client.summingbird.UnifiedUserActionsSourceScrooge +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import scala.collection.JavaConverters._ + +/** + * Storm Producer for client events generated on Home, Profile, and Search + */ +class TimelinesStormAggregateSource extends StormAggregateSource { + + override val name = "timelines_rta" + override val timestampFeature = SharedFeatures.TIMESTAMP + + private lazy val TimelinesClientEventSourceName = "TL_EVENTS_SOURCE" + private lazy val ProfileClientEventSourceName = "PROFILE_EVENTS_SOURCE" + private lazy val SearchClientEventSourceName = "SEARCH_EVENTS_SOURCE" + private lazy val UuaEventSourceName = "UUA_EVENTS_SOURCE" + private lazy val CombinedProducerName = "COMBINED_PRODUCER" + private lazy val FeatureStoreProducerName = "FEATURE_STORE_PRODUCER" + + private def isNewUserEvent(event: LogEvent): Boolean = { + event.logBase.flatMap(_.userId).flatMap(SnowflakeId.timeFromIdOpt).exists(_.untilNow < 30.days) + } + + private def mkDataRecords(event: LogEvent, dataRecordCounter: Counter): Seq[DataRecord] = { + val dataRecords: Seq[DataRecord] = + if (AllScribeProcessor.isValidSuggestTweetEvent(event)) { + ClientLogEventAdapter.adaptToDataRecords(event).asScala + } else { + Seq.empty[DataRecord] + } + dataRecordCounter.incr(dataRecords.size) + dataRecords + } + + private def mkProfileDataRecords( + event: LogEvent, + dataRecordCounter: Counter + ): Seq[DataRecord] = { + val dataRecords: Seq[DataRecord] = + ProfileClientLogEventAdapter.adaptToDataRecords(event).asScala + dataRecordCounter.incr(dataRecords.size) + dataRecords + } + + private def mkSearchDataRecords( + event: LogEvent, + dataRecordCounter: Counter + ): Seq[DataRecord] = { + val dataRecords: Seq[DataRecord] = + SearchClientLogEventAdapter.adaptToDataRecords(event).asScala + dataRecordCounter.incr(dataRecords.size) + dataRecords + } + + private def mkUuaDataRecords( + event: UnifiedUserAction, + dataRecordCounter: Counter + ): Seq[DataRecord] = { + val dataRecords: Seq[DataRecord] = + UuaEventAdapter.adaptToDataRecords(event).asScala + dataRecordCounter.incr(dataRecords.size) + dataRecords + } + + override def build( + statsReceiver: StatsReceiver, + jobConfig: RealTimeAggregatesJobConfig + ): Producer[Storm, DataRecord] = { + lazy val scopedStatsReceiver = statsReceiver.scope(getClass.getSimpleName) + lazy val dataRecordCounter = scopedStatsReceiver.counter("dataRecord") + + // Home Timeline Engagements + // Step 1: => LogEvent + lazy val clientEventProducer: Producer[Storm, HomeEvent[LogEvent]] = + ClientEventSourceScrooge2( + appId = AppId(jobConfig.appId), + topic = "julep_client_event_suggests", + resumeAtLastReadOffset = false, + enableTls = true + ).source.map(HomeEvent[LogEvent]).name(TimelinesClientEventSourceName) + + // Profile Engagements + // Step 1: => LogEvent + lazy val profileClientEventProducer: Producer[Storm, ProfileEvent[LogEvent]] = + ClientEventSourceScrooge2( + appId = AppId(jobConfig.appId), + topic = "julep_client_event_profile_real_time_engagement_metrics", + resumeAtLastReadOffset = false, + enableTls = true + ).source + .map(ProfileEvent[LogEvent]) + .name(ProfileClientEventSourceName) + + // Search Engagements + // Step 1: => LogEvent + // Only process events for all users to save resource + lazy val searchClientEventProducer: Producer[Storm, SearchEvent[LogEvent]] = + ClientEventSourceScrooge2( + appId = AppId(jobConfig.appId), + topic = "julep_client_event_search_real_time_engagement_metrics", + resumeAtLastReadOffset = false, + enableTls = true + ).source + .map(SearchEvent[LogEvent]) + .name(SearchClientEventSourceName) + + // Unified User Actions (includes Home and other product surfaces) + lazy val uuaEventProducer: Producer[Storm, UuaEvent[UnifiedUserAction]] = + UnifiedUserActionsSourceScrooge( + appId = AppId(jobConfig.appId), + parallelism = 10, + kafkaConfig = KafkaConfigs.ProdUnifiedUserActionsEngagementOnly + ).source + .filter(StormAggregateSourceUtils.isUuaBCEEventsFromHome(_)) + .map(UuaEvent[UnifiedUserAction]) + .name(UuaEventSourceName) + + // Combined + // Step 2: + // (a) Combine + // (b) Transform LogEvent => Seq[DataRecord] + // (c) Apply sampler + lazy val combinedClientEventDataRecordProducer: Producer[Storm, Event[DataRecord]] = + profileClientEventProducer // This becomes the bottom branch + .merge(clientEventProducer) // This becomes the middle branch + .merge(searchClientEventProducer) + .merge(uuaEventProducer) // This becomes the top + .flatMap { // LogEvent => Seq[DataRecord] + case e: HomeEvent[LogEvent] => + mkDataRecords(e.event, dataRecordCounter).map(HomeEvent[DataRecord]) + case e: ProfileEvent[LogEvent] => + mkProfileDataRecords(e.event, dataRecordCounter).map(ProfileEvent[DataRecord]) + case e: SearchEvent[LogEvent] => + mkSearchDataRecords(e.event, dataRecordCounter).map(SearchEvent[DataRecord]) + case e: UuaEvent[UnifiedUserAction] => + mkUuaDataRecords( + e.event, + dataRecordCounter + ).map(UuaEvent[DataRecord]) + } + .flatMap { // Apply sampler + case e: HomeEvent[DataRecord] => + jobConfig.sequentiallyTransform(e.event).map(HomeEvent[DataRecord]) + case e: ProfileEvent[DataRecord] => + jobConfig.sequentiallyTransform(e.event).map(ProfileEvent[DataRecord]) + case e: SearchEvent[DataRecord] => + jobConfig.sequentiallyTransform(e.event).map(SearchEvent[DataRecord]) + case e: UuaEvent[DataRecord] => + jobConfig.sequentiallyTransform(e.event).map(UuaEvent[DataRecord]) + } + .name(CombinedProducerName) + + // Step 3: Join with Feature Store features + lazy val featureStoreDataRecordProducer: Producer[Storm, DataRecord] = + StormAggregateSourceUtils + .wrapByFeatureStoreClient( + underlyingProducer = combinedClientEventDataRecordProducer, + jobConfig = jobConfig, + scopedStatsReceiver = scopedStatsReceiver + ).map(_.event).name(FeatureStoreProducerName) + + featureStoreDataRecordProducer + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesAdapter.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesAdapter.scala new file mode 100644 index 000000000..0d5c06d7c --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesAdapter.scala @@ -0,0 +1,35 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.featurestore.catalog.entities.core.Tweet +import com.twitter.ml.featurestore.catalog.features.trends.TweetTrendsScores +import com.twitter.ml.featurestore.lib.TweetId +import com.twitter.ml.featurestore.lib.data.PredictionRecord +import com.twitter.ml.featurestore.lib.data.PredictionRecordAdapter +import com.twitter.ml.featurestore.lib.feature.BoundFeature +import com.twitter.ml.featurestore.lib.feature.BoundFeatureSet +import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase +import java.util +import scala.collection.JavaConverters._ + +object TweetFeaturesAdapter extends TimelinesAdapterBase[PredictionRecord] { + + private val ContinuousFeatureMap: Map[BoundFeature[TweetId, Double], Feature.Continuous] = Map() + + val TweetFeaturesSet: BoundFeatureSet = new BoundFeatureSet(ContinuousFeatureMap.keys.toSet) + + val AllFeatures: Seq[Feature[_]] = + ContinuousFeatureMap.values.toSeq + + private val adapter = PredictionRecordAdapter.oneToOne(TweetFeaturesSet) + + override def getFeatureContext: FeatureContext = new FeatureContext(AllFeatures: _*) + + override def commonFeatures: Set[Feature[_]] = Set.empty + + override def adaptToDataRecords(record: PredictionRecord): util.List[DataRecord] = { + List(adapter.adaptToDataRecord(record)).asJava + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesReadableStore.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesReadableStore.scala new file mode 100644 index 000000000..b461e179a --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TweetFeaturesReadableStore.scala @@ -0,0 +1,53 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.ml.api.DataRecord +import com.twitter.ml.featurestore.lib.TweetId +import com.twitter.ml.featurestore.lib.data.PredictionRecord +import com.twitter.ml.featurestore.lib.entity.Entity +import com.twitter.ml.featurestore.lib.online.{FeatureStoreClient, FeatureStoreRequest} +import com.twitter.storehaus.ReadableStore +import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase +import com.twitter.util.Future +import scala.collection.JavaConverters._ + +class TweetFeaturesReadableStore( + featureStoreClient: FeatureStoreClient, + tweetEntity: Entity[TweetId], + tweetFeaturesAdapter: TimelinesAdapterBase[PredictionRecord]) + extends ReadableStore[Set[Long], DataRecord] { + + override def multiGet[K <: Set[Long]](keys: Set[K]): Map[K, Future[Option[DataRecord]]] = { + val orderedKeys: Seq[K] = keys.toSeq + val featureStoreRequests: Seq[FeatureStoreRequest] = getFeatureStoreRequests(orderedKeys) + val predictionRecordsFut: Future[Seq[PredictionRecord]] = featureStoreClient( + featureStoreRequests) + + getDataRecordMap(orderedKeys, predictionRecordsFut) + } + + private def getFeatureStoreRequests[K <: Set[Long]]( + orderedKeys: Seq[K] + ): Seq[FeatureStoreRequest] = { + orderedKeys.map { key: Set[Long] => + FeatureStoreRequest( + entityIds = key.map { tweetId => tweetEntity.withId(TweetId(tweetId)) }.toSeq + ) + } + } + + private def getDataRecordMap[K <: Set[Long]]( + orderedKeys: Seq[K], + predictionRecordsFut: Future[Seq[PredictionRecord]] + ): Map[K, Future[Option[DataRecord]]] = { + orderedKeys.zipWithIndex.map { + case (tweetIdSet, index) => + val dataRecordFutOpt: Future[Option[DataRecord]] = predictionRecordsFut.map { + predictionRecords => + predictionRecords.lift(index).flatMap { predictionRecordAtIndex: PredictionRecord => + tweetFeaturesAdapter.adaptToDataRecords(predictionRecordAtIndex).asScala.headOption + } + } + (tweetIdSet, dataRecordFutOpt) + }.toMap + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TypeSafeRunner.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TypeSafeRunner.scala new file mode 100644 index 000000000..92b6618e4 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/TypeSafeRunner.scala @@ -0,0 +1,7 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.summingbird_internal.runner.storm.GenericRunner + +object TypeSafeRunner { + def main(args: Array[String]): Unit = GenericRunner(args, TimelinesRealTimeAggregatesJob(_)) +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesAdapter.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesAdapter.scala new file mode 100644 index 000000000..8ff39938c --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesAdapter.scala @@ -0,0 +1,108 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType.InferredGender +import com.twitter.dal.personal_data.thriftjava.PersonalDataType.UserState +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Text +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api.RichDataRecord +import com.twitter.ml.featurestore.catalog.entities.core.User +import com.twitter.ml.featurestore.catalog.features.core.UserAccount +import com.twitter.ml.featurestore.catalog.features.geo.UserLocation +import com.twitter.ml.featurestore.catalog.features.magicrecs.UserActivity +import com.twitter.ml.featurestore.lib.EntityId +import com.twitter.ml.featurestore.lib.data.PredictionRecord +import com.twitter.ml.featurestore.lib.feature.BoundFeature +import com.twitter.ml.featurestore.lib.feature.BoundFeatureSet +import com.twitter.ml.featurestore.lib.UserId +import com.twitter.ml.featurestore.lib.{Discrete => FSDiscrete} +import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase +import com.twitter.timelines.prediction.features.user_health.UserHealthFeatures +import java.lang.{Boolean => JBoolean} +import java.lang.{String => JString} +import java.util +import scala.collection.JavaConverters._ + +object UserFeaturesAdapter extends TimelinesAdapterBase[PredictionRecord] { + val UserStateBoundFeature: BoundFeature[UserId, FSDiscrete] = UserActivity.UserState.bind(User) + + /** + * Boolean features about viewer's user state. + * enum UserState { + * NEW = 0, + * NEAR_ZERO = 1, + * VERY_LIGHT = 2, + * LIGHT = 3, + * MEDIUM_TWEETER = 4, + * MEDIUM_NON_TWEETER = 5, + * HEAVY_NON_TWEETER = 6, + * HEAVY_TWEETER = 7 + * }(persisted='true') + */ + val IS_USER_NEW = new Binary("timelines.user_state.is_user_new", Set(UserState).asJava) + val IS_USER_LIGHT = new Binary("timelines.user_state.is_user_light", Set(UserState).asJava) + val IS_USER_MEDIUM_TWEETER = + new Binary("timelines.user_state.is_user_medium_tweeter", Set(UserState).asJava) + val IS_USER_MEDIUM_NON_TWEETER = + new Binary("timelines.user_state.is_user_medium_non_tweeter", Set(UserState).asJava) + val IS_USER_HEAVY_NON_TWEETER = + new Binary("timelines.user_state.is_user_heavy_non_tweeter", Set(UserState).asJava) + val IS_USER_HEAVY_TWEETER = + new Binary("timelines.user_state.is_user_heavy_tweeter", Set(UserState).asJava) + val userStateToFeatureMap: Map[Long, Binary] = Map( + 0L -> IS_USER_NEW, + 1L -> IS_USER_LIGHT, + 2L -> IS_USER_LIGHT, + 3L -> IS_USER_LIGHT, + 4L -> IS_USER_MEDIUM_TWEETER, + 5L -> IS_USER_MEDIUM_NON_TWEETER, + 6L -> IS_USER_HEAVY_NON_TWEETER, + 7L -> IS_USER_HEAVY_TWEETER + ) + + val UserStateBooleanFeatures: Set[Feature[_]] = userStateToFeatureMap.values.toSet + + + val USER_COUNTRY_ID = new Text("geo.user_location.country_code") + val UserCountryCodeFeature: BoundFeature[UserId, String] = + UserLocation.CountryCodeAlpha2.bind(User) + val UserLocationFeatures: Set[Feature[_]] = Set(USER_COUNTRY_ID) + + private val UserVerifiedFeaturesSet = Set( + UserAccount.IsUserVerified.bind(User), + UserAccount.IsUserBlueVerified.bind(User), + UserAccount.IsUserGoldVerified.bind(User), + UserAccount.IsUserGrayVerified.bind(User) + ) + + val UserFeaturesSet: BoundFeatureSet = + BoundFeatureSet(UserStateBoundFeature, UserCountryCodeFeature) ++ + BoundFeatureSet(UserVerifiedFeaturesSet.asInstanceOf[Set[BoundFeature[_ <: EntityId, _]]]) + + private val allFeatures: Seq[Feature[_]] = + UserStateBooleanFeatures.toSeq ++ GenderBooleanFeatures.toSeq ++ + UserLocationFeatures.toSeq ++ Seq(UserHealthFeatures.IsUserVerifiedUnion) + + override def getFeatureContext: FeatureContext = new FeatureContext(allFeatures: _*) + override def commonFeatures: Set[Feature[_]] = Set.empty + + override def adaptToDataRecords(record: PredictionRecord): util.List[DataRecord] = { + val newRecord = new RichDataRecord(new DataRecord) + record + .getFeatureValue(UserStateBoundFeature) + .flatMap { userState => userStateToFeatureMap.get(userState.value) }.foreach { + booleanFeature => newRecord.setFeatureValue[JBoolean](booleanFeature, true) + } + record.getFeatureValue(UserCountryCodeFeature).foreach { countryCodeFeatureValue => + newRecord.setFeatureValue[JString](USER_COUNTRY_ID, countryCodeFeatureValue) + } + + val isUserVerifiedUnion = + UserVerifiedFeaturesSet.exists(feature => record.getFeatureValue(feature).getOrElse(false)) + newRecord.setFeatureValue[JBoolean](UserHealthFeatures.IsUserVerifiedUnion, isUserVerifiedUnion) + + List(newRecord.getRecord).asJava + } +} diff --git a/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesReadableStore.scala b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesReadableStore.scala new file mode 100644 index 000000000..c1931c32b --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/UserFeaturesReadableStore.scala @@ -0,0 +1,37 @@ +package com.twitter.timelines.prediction.common.aggregates.real_time + +import com.twitter.ml.api.DataRecord +import com.twitter.ml.featurestore.lib.UserId +import com.twitter.ml.featurestore.lib.data.PredictionRecord +import com.twitter.ml.featurestore.lib.entity.Entity +import com.twitter.ml.featurestore.lib.online.{FeatureStoreClient, FeatureStoreRequest} +import com.twitter.storehaus.ReadableStore +import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase +import com.twitter.util.Future +import scala.collection.JavaConverters._ + +class UserFeaturesReadableStore( + featureStoreClient: FeatureStoreClient, + userEntity: Entity[UserId], + userFeaturesAdapter: TimelinesAdapterBase[PredictionRecord]) + extends ReadableStore[Set[Long], DataRecord] { + + override def multiGet[K <: Set[Long]](keys: Set[K]): Map[K, Future[Option[DataRecord]]] = { + val orderedKeys = keys.toSeq + val featureStoreRequests: Seq[FeatureStoreRequest] = orderedKeys.map { key: Set[Long] => + FeatureStoreRequest( + entityIds = key.map(userId => userEntity.withId(UserId(userId))).toSeq + ) + } + val predictionRecordsFut: Future[Seq[PredictionRecord]] = featureStoreClient( + featureStoreRequests) + + orderedKeys.zipWithIndex.map { + case (userId, index) => + val dataRecordFutOpt = predictionRecordsFut.map { predictionRecords => + userFeaturesAdapter.adaptToDataRecords(predictionRecords(index)).asScala.headOption + } + (userId, dataRecordFutOpt) + }.toMap + } +} diff --git a/src/scala/com/twitter/timelines/prediction/features/README.md b/src/scala/com/twitter/timelines/prediction/features/README.md new file mode 100644 index 000000000..d42639a77 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/README.md @@ -0,0 +1,6 @@ +## Prediction Features + +This directory contains a collection of `Features` (`com.twitter.ml.api.Feature`) which are definitions of feature names and datatypes which allow the features to be efficiently processed and passed to the different ranking models. +By predefining the features with their names and datatypes, when features are being generated, scribed or used to score they can be identified with only a hash of their name. + +Not all of these features are used in the model, many are experimental or deprecated. \ No newline at end of file diff --git a/src/scala/com/twitter/timelines/prediction/features/client_log_event/BUILD b/src/scala/com/twitter/timelines/prediction/features/client_log_event/BUILD new file mode 100644 index 000000000..3d3c34092 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/client_log_event/BUILD @@ -0,0 +1,11 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/scala/com/twitter/suggests/controller_data", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/timelineservice/server/suggests/logging:thrift-scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/client_log_event/ClientLogEventDataRecordFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/client_log_event/ClientLogEventDataRecordFeatures.scala new file mode 100644 index 000000000..cccb99998 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/client_log_event/ClientLogEventDataRecordFeatures.scala @@ -0,0 +1,169 @@ +package com.twitter.timelines.prediction.features.client_log_event + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.Discrete +import scala.collection.JavaConverters._ +import com.twitter.timelineservice.suggests.logging.candidate_tweet_source_id.thriftscala.CandidateTweetSourceId + +object ClientLogEventDataRecordFeatures { + val HasConsumerVideo = new Binary( + "client_log_event.tweet.has_consumer_video", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val PhotoCount = new Continuous( + "client_log_event.tweet.photo_count", + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val HasImage = new Binary( + "client_log_event.tweet.has_image", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val IsReply = + new Binary("client_log_event.tweet.is_reply", Set(PublicReplies, PrivateReplies).asJava) + val IsRetweet = + new Binary("client_log_event.tweet.is_retweet", Set(PublicRetweets, PrivateRetweets).asJava) + val IsPromoted = + new Binary( + "client_log_event.tweet.is_promoted", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HasVisibleLink = new Binary( + "client_log_event.tweet.has_visible_link", + Set(UrlFoundFlag, PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HasHashtag = new Binary( + "client_log_event.tweet.has_hashtag", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val FromMutualFollow = new Binary("client_log_event.tweet.from_mutual_follow") + val IsInNetwork = new Binary("client_log_event.tweet.is_in_network") + val IsNotInNetwork = new Binary("client_log_event.tweet.is_not_in_network") + val FromRecap = new Binary("client_log_event.tweet.from_recap") + val FromRecycled = new Binary("client_log_event.tweet.from_recycled") + val FromActivity = new Binary("client_log_event.tweet.from_activity") + val FromSimcluster = new Binary("client_log_event.tweet.from_simcluster") + val FromErg = new Binary("client_log_event.tweet.from_erg") + val FromCroon = new Binary("client_log_event.tweet.from_croon") + val FromList = new Binary("client_log_event.tweet.from_list") + val FromRecTopic = new Binary("client_log_event.tweet.from_rec_topic") + val InjectedPosition = new Discrete("client_log_event.tweet.injectedPosition") + val TextOnly = new Binary("client_log_event.tweet.text_only") + val HasLikedBySocialContext = new Binary("client_log_event.tweet.has_liked_by_social_context") + val HasFollowedBySocialContext = new Binary( + "client_log_event.tweet.has_followed_by_social_context") + val HasTopicSocialContext = new Binary("client_log_event.tweet.has_topic_social_context") + val IsFollowedTopicTweet = new Binary("client_log_event.tweet.is_followed_topic_tweet") + val IsRecommendedTopicTweet = new Binary("client_log_event.tweet.is_recommended_topic_tweet") + val IsTweetAgeLessThan15Seconds = new Binary( + "client_log_event.tweet.tweet_age_less_than_15_seconds") + val IsTweetAgeLessThanOrEqualTo30Minutes = new Binary( + "client_log_event.tweet.tweet_age_lte_30_minutes") + val IsTweetAgeLessThanOrEqualTo1Hour = new Binary("client_log_event.tweet.tweet_age_lte_1_hour") + val IsTweetAgeLessThanOrEqualTo6Hours = new Binary("client_log_event.tweet.tweet_age_lte_6_hours") + val IsTweetAgeLessThanOrEqualTo12Hours = new Binary( + "client_log_event.tweet.tweet_age_lte_12_hours") + val IsTweetAgeGreaterThanOrEqualTo24Hours = new Binary( + "client_log_event.tweet.tweet_age_gte_24_hours") + val HasGreaterThanOrEqualTo100Favs = new Binary("client_log_event.tweet.has_gte_100_favs") + val HasGreaterThanOrEqualTo1KFavs = new Binary("client_log_event.tweet.has_gte_1k_favs") + val HasGreaterThanOrEqualTo10KFavs = new Binary("client_log_event.tweet.has_gte_10k_favs") + val HasGreaterThanOrEqualTo100KFavs = new Binary("client_log_event.tweet.has_gte_100k_favs") + val HasGreaterThanOrEqualTo10Retweets = new Binary("client_log_event.tweet.has_gte_10_retweets") + val HasGreaterThanOrEqualTo100Retweets = new Binary("client_log_event.tweet.has_gte_100_retweets") + val HasGreaterThanOrEqualTo1KRetweets = new Binary("client_log_event.tweet.has_gte_1k_retweets") + + val TweetTypeToFeatureMap: Map[String, Binary] = Map( + "link" -> HasVisibleLink, + "hashtag" -> HasHashtag, + "mutual_follow" -> FromMutualFollow, + "in_network" -> IsInNetwork, + "text_only" -> TextOnly, + "has_liked_by_social_context" -> HasLikedBySocialContext, + "has_followed_by_social_context" -> HasFollowedBySocialContext, + "has_topic_social_context" -> HasTopicSocialContext, + "is_followed_topic_tweet" -> IsFollowedTopicTweet, + "is_recommended_topic_tweet" -> IsRecommendedTopicTweet, + "tweet_age_less_than_15_seconds" -> IsTweetAgeLessThan15Seconds, + "tweet_age_lte_30_minutes" -> IsTweetAgeLessThanOrEqualTo30Minutes, + "tweet_age_lte_1_hour" -> IsTweetAgeLessThanOrEqualTo1Hour, + "tweet_age_lte_6_hours" -> IsTweetAgeLessThanOrEqualTo6Hours, + "tweet_age_lte_12_hours" -> IsTweetAgeLessThanOrEqualTo12Hours, + "tweet_age_gte_24_hours" -> IsTweetAgeGreaterThanOrEqualTo24Hours, + "has_gte_100_favs" -> HasGreaterThanOrEqualTo100Favs, + "has_gte_1k_favs" -> HasGreaterThanOrEqualTo1KFavs, + "has_gte_10k_favs" -> HasGreaterThanOrEqualTo10KFavs, + "has_gte_100k_favs" -> HasGreaterThanOrEqualTo100KFavs, + "has_gte_10_retweets" -> HasGreaterThanOrEqualTo10Retweets, + "has_gte_100_retweets" -> HasGreaterThanOrEqualTo100Retweets, + "has_gte_1k_retweets" -> HasGreaterThanOrEqualTo1KRetweets + ) + + val CandidateTweetSourceIdFeatureMap: Map[Int, Binary] = Map( + CandidateTweetSourceId.RecapTweet.value -> FromRecap, + CandidateTweetSourceId.RecycledTweet.value -> FromRecycled, + CandidateTweetSourceId.RecommendedTweet.value -> FromActivity, + CandidateTweetSourceId.Simcluster.value -> FromSimcluster, + CandidateTweetSourceId.ErgTweet.value -> FromErg, + CandidateTweetSourceId.CroonTopicTweet.value -> FromCroon, + CandidateTweetSourceId.CroonTweet.value -> FromCroon, + CandidateTweetSourceId.ListTweet.value -> FromList, + CandidateTweetSourceId.RecommendedTopicTweet.value -> FromRecTopic + ) + + val TweetFeaturesV2: Set[Feature[_]] = Set( + HasImage, + IsReply, + IsRetweet, + HasVisibleLink, + HasHashtag, + FromMutualFollow, + IsInNetwork + ) + + val ContentTweetTypeFeatures: Set[Feature[_]] = Set( + HasImage, + HasVisibleLink, + HasHashtag, + TextOnly, + HasVisibleLink + ) + + val FreshnessTweetTypeFeatures: Set[Feature[_]] = Set( + IsTweetAgeLessThan15Seconds, + IsTweetAgeLessThanOrEqualTo30Minutes, + IsTweetAgeLessThanOrEqualTo1Hour, + IsTweetAgeLessThanOrEqualTo6Hours, + IsTweetAgeLessThanOrEqualTo12Hours, + IsTweetAgeGreaterThanOrEqualTo24Hours + ) + + val SocialProofTweetTypeFeatures: Set[Feature[_]] = Set( + HasLikedBySocialContext, + HasFollowedBySocialContext, + HasTopicSocialContext + ) + + val TopicTweetPreferenceTweetTypeFeatures: Set[Feature[_]] = Set( + IsFollowedTopicTweet, + IsRecommendedTopicTweet + ) + + val TweetPopularityTweetTypeFeatures: Set[Feature[_]] = Set( + HasGreaterThanOrEqualTo100Favs, + HasGreaterThanOrEqualTo1KFavs, + HasGreaterThanOrEqualTo10KFavs, + HasGreaterThanOrEqualTo100KFavs, + HasGreaterThanOrEqualTo10Retweets, + HasGreaterThanOrEqualTo100Retweets, + HasGreaterThanOrEqualTo1KRetweets + ) + + val UserGraphInteractionTweetTypeFeatures: Set[Feature[_]] = Set( + IsInNetwork, + FromMutualFollow, + IsNotInNetwork, + IsPromoted + ) + + val UserContentPreferenceTweetTypeFeatures: Set[Feature[_]] = + ContentTweetTypeFeatures ++ FreshnessTweetTypeFeatures ++ SocialProofTweetTypeFeatures ++ TopicTweetPreferenceTweetTypeFeatures ++ TweetPopularityTweetTypeFeatures ++ UserGraphInteractionTweetTypeFeatures + val AuthorContentPreferenceTweetTypeFeatures: Set[Feature[_]] = + Set(IsInNetwork, FromMutualFollow, IsNotInNetwork) ++ ContentTweetTypeFeatures +} diff --git a/src/scala/com/twitter/timelines/prediction/features/common/BUILD b/src/scala/com/twitter/timelines/prediction/features/common/BUILD new file mode 100644 index 000000000..bfbe764c7 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/common/BUILD @@ -0,0 +1,11 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/common/CombinedFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/common/CombinedFeatures.scala new file mode 100644 index 000000000..d995fe2b0 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/common/CombinedFeatures.scala @@ -0,0 +1,536 @@ +package com.twitter.timelines.prediction.features.common + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureType +import com.twitter.ml.api.Feature.Binary +import java.lang.{Boolean => JBoolean} +import scala.collection.JavaConverters._ + +object CombinedFeatures { + val IS_CLICKED = + new Binary("timelines.engagement.is_clicked", Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_DWELLED = + new Binary("timelines.engagement.is_dwelled", Set(TweetsViewed, EngagementsPrivate).asJava) + val IS_DWELLED_IN_BOUNDS_V1 = new Binary( + "timelines.engagement.is_dwelled_in_bounds_v1", + Set(TweetsViewed, EngagementsPrivate).asJava) + val IS_FAVORITED = new Binary( + "timelines.engagement.is_favorited", + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_FOLLOWED = new Binary( + "timelines.engagement.is_followed", + Set(EngagementsPrivate, EngagementsPublic, Follow).asJava) + val IS_IMPRESSED = + new Binary("timelines.engagement.is_impressed", Set(TweetsViewed, EngagementsPrivate).asJava) + val IS_OPEN_LINKED = new Binary( + "timelines.engagement.is_open_linked", + Set(EngagementsPrivate, LinksClickedOn).asJava) + val IS_PHOTO_EXPANDED = new Binary( + "timelines.engagement.is_photo_expanded", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED = new Binary( + "timelines.engagement.is_profile_clicked", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_QUOTED = new Binary( + "timelines.engagement.is_quoted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED = new Binary( + "timelines.engagement.is_replied", + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_RETWEETED = new Binary( + "timelines.engagement.is_retweeted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_RETWEETED_WITHOUT_QUOTE = new Binary( + "timelines.enagagement.is_retweeted_without_quote", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_SHARE_DM_CLICKED = + new Binary("timelines.engagement.is_tweet_share_dm_clicked", Set(EngagementsPrivate).asJava) + val IS_SHARE_DM_SENT = + new Binary("timelines.engagement.is_tweet_share_dm_sent", Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_25 = new Binary( + "timelines.engagement.is_video_playback_25", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_50 = new Binary( + "timelines.engagement.is_video_playback_50", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_75 = new Binary( + "timelines.engagement.is_video_playback_75", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_95 = new Binary( + "timelines.engagement.is_video_playback_95", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_COMPLETE = new Binary( + "timelines.engagement.is_video_playback_complete", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_START = new Binary( + "timelines.engagement.is_video_playback_start", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_VIEWED = new Binary( + "timelines.engagement.is_video_viewed", + Set(MediaEngagementActivities, EngagementsPrivate).asJava) + val IS_VIDEO_QUALITY_VIEWED = new Binary( + "timelines.engagement.is_video_quality_viewed", + Set(MediaEngagementActivities, EngagementsPrivate).asJava + ) + // v1: post click engagements: fav, reply + val IS_GOOD_CLICKED_CONVO_DESC_V1 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_favorited_or_replied", + Set( + TweetsClicked, + PublicLikes, + PrivateLikes, + PublicReplies, + PrivateReplies, + EngagementsPrivate, + EngagementsPublic).asJava) + // v2: post click engagements: click + val IS_GOOD_CLICKED_CONVO_DESC_V2 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_v2", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_WITH_DWELL_SUM_GTE_60S = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_favorited_or_replied_or_dwell_sum_gte_60_secs", + Set( + TweetsClicked, + PublicLikes, + PrivateLikes, + PublicReplies, + PrivateReplies, + EngagementsPrivate, + EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_FAVORITED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_favorited", + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_REPLIED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_replied", + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_RETWEETED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_retweeted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_CLICKED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_clicked", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_FOLLOWED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_followed", + Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_SHARE_DM_CLICKED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_share_dm_clicked", + Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_PROFILE_CLICKED = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_profile_clicked", + Set(EngagementsPrivate).asJava) + + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_0 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_uam_gt_0", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_1 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_uam_gt_1", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_2 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_uam_gt_2", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_3 = new Binary( + "timelines.engagement.is_good_clicked_convo_desc_uam_gt_3", + Set(EngagementsPrivate, EngagementsPublic).asJava) + + val IS_TWEET_DETAIL_DWELLED = new Binary( + "timelines.engagement.is_tweet_detail_dwelled", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_8_SEC = new Binary( + "timelines.engagement.is_tweet_detail_dwelled_8_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_15_SEC = new Binary( + "timelines.engagement.is_tweet_detail_dwelled_15_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_25_SEC = new Binary( + "timelines.engagement.is_tweet_detail_dwelled_25_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_30_SEC = new Binary( + "timelines.engagement.is_tweet_detail_dwelled_30_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_PROFILE_DWELLED = new Binary( + "timelines.engagement.is_profile_dwelled", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_10_SEC = new Binary( + "timelines.engagement.is_profile_dwelled_10_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_20_SEC = new Binary( + "timelines.engagement.is_profile_dwelled_20_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_30_SEC = new Binary( + "timelines.engagement.is_profile_dwelled_30_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED = new Binary( + "timelines.engagement.is_fullscreen_video_dwelled", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_5_SEC = new Binary( + "timelines.engagement.is_fullscreen_video_dwelled_5_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_10_SEC = new Binary( + "timelines.engagement.is_fullscreen_video_dwelled_10_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_20_SEC = new Binary( + "timelines.engagement.is_fullscreen_video_dwelled_20_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_30_SEC = new Binary( + "timelines.engagement.is_fullscreen_video_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_15_SEC = new Binary( + "timelines.engagement.is_link_dwelled_15_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_30_SEC = new Binary( + "timelines.engagement.is_link_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_60_SEC = new Binary( + "timelines.engagement.is_link_dwelled_60_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_HOME_LATEST_VISITED = + new Binary("timelines.engagement.is_home_latest_visited", Set(EngagementsPrivate).asJava) + + val IS_BOOKMARKED = + new Binary("timelines.engagement.is_bookmarked", Set(EngagementsPrivate).asJava) + val IS_SHARED = + new Binary("timelines.engagement.is_shared", Set(EngagementsPrivate).asJava) + val IS_SHARE_MENU_CLICKED = + new Binary("timelines.engagement.is_share_menu_clicked", Set(EngagementsPrivate).asJava) + + // Negative engagements + val IS_DONT_LIKE = new Binary("timelines.engagement.is_dont_like", Set(EngagementsPrivate).asJava) + val IS_BLOCK_CLICKED = new Binary( + "timelines.engagement.is_block_clicked", + Set(Blocks, TweetsClicked, EngagementsPrivate, EngagementsPublic).asJava) + val IS_BLOCK_DIALOG_BLOCKED = new Binary( + "timelines.engagement.is_block_dialog_blocked", + Set(Blocks, EngagementsPrivate, EngagementsPublic).asJava) + val IS_MUTE_CLICKED = new Binary( + "timelines.engagement.is_mute_clicked", + Set(Mutes, TweetsClicked, EngagementsPrivate).asJava) + val IS_MUTE_DIALOG_MUTED = + new Binary("timelines.engagement.is_mute_dialog_muted", Set(Mutes, EngagementsPrivate).asJava) + val IS_REPORT_TWEET_CLICKED = new Binary( + "timelines.engagement.is_report_tweet_clicked", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_CARET_CLICKED = + new Binary("timelines.engagement.is_caret_clicked", Set(EngagementsPrivate).asJava) + val IS_NOT_ABOUT_TOPIC = + new Binary("timelines.engagement.is_not_about_topic", Set(EngagementsPrivate).asJava) + val IS_NOT_RECENT = + new Binary("timelines.engagement.is_not_recent", Set(EngagementsPrivate).asJava) + val IS_NOT_RELEVANT = + new Binary("timelines.engagement.is_not_relevant", Set(EngagementsPrivate).asJava) + val IS_SEE_FEWER = + new Binary("timelines.engagement.is_see_fewer", Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC = + new Binary("timelines.engagement.is_unfollow_topic", Set(EngagementsPrivate).asJava) + val IS_FOLLOW_TOPIC = + new Binary("timelines.engagement.is_follow_topic", Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN_TOPIC = + new Binary("timelines.engagement.is_not_interested_in_topic", Set(EngagementsPrivate).asJava) + val IS_NEGATIVE_FEEDBACK = + new Binary("timelines.engagement.is_negative_feedback", Set(EngagementsPrivate).asJava) + val IS_IMPLICIT_POSITIVE_FEEDBACK_UNION = + new Binary( + "timelines.engagement.is_implicit_positive_feedback_union", + Set(EngagementsPrivate).asJava) + val IS_EXPLICIT_POSITIVE_FEEDBACK_UNION = + new Binary( + "timelines.engagement.is_explicit_positive_feedback_union", + Set(EngagementsPrivate).asJava) + val IS_ALL_NEGATIVE_FEEDBACK_UNION = + new Binary( + "timelines.engagement.is_all_negative_feedback_union", + Set(EngagementsPrivate).asJava) + // Reciprocal engagements for reply forward engagement + val IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_impressed_by_author", + Set(EngagementsPrivate).asJava) + val IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_favorited_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateLikes, PublicLikes).asJava) + val IS_REPLIED_REPLY_QUOTED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_quoted_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava) + val IS_REPLIED_REPLY_REPLIED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_replied_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateReplies, PublicReplies).asJava) + val IS_REPLIED_REPLY_RETWEETED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_retweeted_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava) + val IS_REPLIED_REPLY_BLOCKED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_blocked_by_author", + Set(Blocks, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_FOLLOWED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_followed_by_author", + Set(EngagementsPrivate, EngagementsPublic, Follow).asJava) + val IS_REPLIED_REPLY_UNFOLLOWED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_unfollowed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_MUTED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_muted_by_author", + Set(Mutes, EngagementsPrivate).asJava) + val IS_REPLIED_REPLY_REPORTED_BY_AUTHOR = new Binary( + "timelines.engagement.is_replied_reply_reported_by_author", + Set(EngagementsPrivate).asJava) + + // Reciprocal engagements for fav forward engagement + val IS_FAVORITED_FAV_FAVORITED_BY_AUTHOR = new Binary( + "timelines.engagement.is_favorited_fav_favorited_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateLikes, PublicLikes).asJava + ) + val IS_FAVORITED_FAV_REPLIED_BY_AUTHOR = new Binary( + "timelines.engagement.is_favorited_fav_replied_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateReplies, PublicReplies).asJava + ) + val IS_FAVORITED_FAV_RETWEETED_BY_AUTHOR = new Binary( + "timelines.engagement.is_favorited_fav_retweeted_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava + ) + val IS_FAVORITED_FAV_FOLLOWED_BY_AUTHOR = new Binary( + "timelines.engagement.is_favorited_fav_followed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava + ) + + // define good profile click by considering following engagements (follow, fav, reply, retweet, etc.) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_FOLLOW = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_follow", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, Follow).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_FAV = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_fav", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateLikes, PublicLikes).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_REPLY = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_reply", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateReplies, PublicReplies).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_RETWEET = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_retweet", + Set( + ProfilesViewed, + ProfilesClicked, + EngagementsPrivate, + PrivateRetweets, + PublicRetweets).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_CLICK = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_tweet_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, TweetsClicked).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_SHARE_DM_CLICK = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_share_dm_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of all binary features above + val IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_engaged", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, EngagementsPublic).asJava) + + // define bad profile click by considering following engagements (user report, tweet report, mute, block, etc) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_USER_REPORT_CLICK = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_user_report_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_REPORT_CLICK = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_tweet_report_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_MUTE = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_mute", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_BLOCK = new Binary( + "timelines.engagement.is_profile_clicked_and_profile_block", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of bad profile click engagements and existing negative feedback + val IS_NEGATIVE_FEEDBACK_V2 = new Binary( + "timelines.engagement.is_negative_feedback_v2", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_NEGATIVE_FEEDBACK_UNION = new Binary( + "timelines.engagement.is_negative_feedback_union", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // don't like, mute or profile page -> mute + val IS_WEAK_NEGATIVE_FEEDBACK = new Binary( + "timelines.engagement.is_weak_negative_feedback", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // report, block or profile page -> report, block + val IS_STRONG_NEGATIVE_FEEDBACK = new Binary( + "timelines.engagement.is_strong_negative_feedback", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // engagement for following user from any surface area + val IS_FOLLOWED_FROM_ANY_SURFACE_AREA = new Binary( + "timelines.engagement.is_followed_from_any_surface_area", + Set(EngagementsPublic, EngagementsPrivate).asJava) + val IS_RELEVANCE_PROMPT_YES_CLICKED = new Binary( + "timelines.engagement.is_relevance_prompt_yes_clicked", + Set(EngagementsPublic, EngagementsPrivate).asJava) + + // Reply downvote engagements + val IS_REPLY_DOWNVOTED = + new Binary("timelines.engagement.is_reply_downvoted", Set(EngagementsPrivate).asJava) + val IS_REPLY_DOWNVOTE_REMOVED = + new Binary("timelines.engagement.is_reply_downvote_removed", Set(EngagementsPrivate).asJava) + + /** + * Contains all engagements that are used/consumed by real-time + * aggregates summingbird jobs. These engagements need to be + * extractable from [[ClientEvent]]. + */ + val EngagementsRealTime: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_DWELLED, + IS_FAVORITED, + IS_FOLLOWED, + IS_OPEN_LINKED, + IS_PHOTO_EXPANDED, + IS_PROFILE_CLICKED, + IS_QUOTED, + IS_REPLIED, + IS_RETWEETED, + IS_RETWEETED_WITHOUT_QUOTE, + IS_SHARE_DM_CLICKED, + IS_SHARE_DM_SENT, + IS_VIDEO_PLAYBACK_50, + IS_VIDEO_VIEWED, + IS_VIDEO_QUALITY_VIEWED + ) + + val NegativeEngagementsRealTime: Set[Feature[JBoolean]] = Set( + IS_REPORT_TWEET_CLICKED, + IS_BLOCK_CLICKED, + IS_MUTE_CLICKED + ) + + val NegativeEngagementsRealTimeDontLike: Set[Feature[JBoolean]] = Set( + IS_DONT_LIKE + ) + + val NegativeEngagementsSecondary: Set[Feature[JBoolean]] = Set( + IS_NOT_INTERESTED_IN_TOPIC, + IS_NOT_ABOUT_TOPIC, + IS_NOT_RECENT, + IS_NOT_RELEVANT, + IS_SEE_FEWER, + IS_UNFOLLOW_TOPIC + ) + + val PrivateEngagements: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_DWELLED, + IS_OPEN_LINKED, + IS_PHOTO_EXPANDED, + IS_PROFILE_CLICKED, + IS_QUOTED, + IS_VIDEO_PLAYBACK_50, + IS_VIDEO_QUALITY_VIEWED + ) + + val ImpressedEngagements: Set[Feature[JBoolean]] = Set( + IS_IMPRESSED + ) + + val PrivateEngagementsV2: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_OPEN_LINKED, + IS_PHOTO_EXPANDED, + IS_PROFILE_CLICKED, + IS_VIDEO_PLAYBACK_50, + IS_VIDEO_QUALITY_VIEWED + ) ++ ImpressedEngagements + + val CoreEngagements: Set[Feature[JBoolean]] = Set( + IS_FAVORITED, + IS_REPLIED, + IS_RETWEETED + ) + + val DwellEngagements: Set[Feature[JBoolean]] = Set( + IS_DWELLED + ) + + val PrivateCoreEngagements: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_OPEN_LINKED, + IS_PHOTO_EXPANDED, + IS_VIDEO_PLAYBACK_50, + IS_VIDEO_QUALITY_VIEWED + ) + + val ConditionalEngagements: Set[Feature[JBoolean]] = Set( + IS_GOOD_CLICKED_CONVO_DESC_V1, + IS_GOOD_CLICKED_CONVO_DESC_V2, + IS_GOOD_CLICKED_WITH_DWELL_SUM_GTE_60S + ) + + val ShareEngagements: Set[Feature[JBoolean]] = Set( + IS_SHARED, + IS_SHARE_MENU_CLICKED + ) + + val BookmarkEngagements: Set[Feature[JBoolean]] = Set( + IS_BOOKMARKED + ) + + val TweetDetailDwellEngagements: Set[Feature[JBoolean]] = Set( + IS_TWEET_DETAIL_DWELLED, + IS_TWEET_DETAIL_DWELLED_8_SEC, + IS_TWEET_DETAIL_DWELLED_15_SEC, + IS_TWEET_DETAIL_DWELLED_25_SEC, + IS_TWEET_DETAIL_DWELLED_30_SEC + ) + + val ProfileDwellEngagements: Set[Feature[JBoolean]] = Set( + IS_PROFILE_DWELLED, + IS_PROFILE_DWELLED_10_SEC, + IS_PROFILE_DWELLED_20_SEC, + IS_PROFILE_DWELLED_30_SEC + ) + + val FullscreenVideoDwellEngagements: Set[Feature[JBoolean]] = Set( + IS_FULLSCREEN_VIDEO_DWELLED, + IS_FULLSCREEN_VIDEO_DWELLED_5_SEC, + IS_FULLSCREEN_VIDEO_DWELLED_10_SEC, + IS_FULLSCREEN_VIDEO_DWELLED_20_SEC, + IS_FULLSCREEN_VIDEO_DWELLED_30_SEC + ) + + // Please do not add new engagements here until having estimated the impact + // to capacity requirements. User-author real-time aggregates have a very + // large key space. + val UserAuthorEngagements: Set[Feature[JBoolean]] = CoreEngagements ++ DwellEngagements ++ Set( + IS_CLICKED, + IS_PROFILE_CLICKED, + IS_PHOTO_EXPANDED, + IS_VIDEO_PLAYBACK_50, + IS_NEGATIVE_FEEDBACK_UNION + ) + + val ImplicitPositiveEngagements: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_DWELLED, + IS_OPEN_LINKED, + IS_PROFILE_CLICKED, + IS_QUOTED, + IS_VIDEO_PLAYBACK_50, + IS_VIDEO_QUALITY_VIEWED, + IS_TWEET_DETAIL_DWELLED, + IS_GOOD_CLICKED_CONVO_DESC_V1, + IS_GOOD_CLICKED_CONVO_DESC_V2, + IS_SHARED, + IS_SHARE_MENU_CLICKED, + IS_SHARE_DM_SENT, + IS_SHARE_DM_CLICKED + ) + + val ExplicitPositiveEngagements: Set[Feature[JBoolean]] = CoreEngagements ++ Set( + IS_FOLLOWED, + IS_QUOTED + ) + + val AllNegativeEngagements: Set[Feature[JBoolean]] = + NegativeEngagementsRealTime ++ NegativeEngagementsRealTimeDontLike ++ Set( + IS_NOT_RECENT, + IS_NOT_RELEVANT, + IS_SEE_FEWER + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/common/NonHomeLabelFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/common/NonHomeLabelFeatures.scala new file mode 100644 index 000000000..369b48b39 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/common/NonHomeLabelFeatures.scala @@ -0,0 +1,97 @@ +package com.twitter.timelines.prediction.features.common + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.Feature.Binary +import java.lang.{Boolean => JBoolean} +import scala.collection.JavaConverters._ + +object ProfileLabelFeatures { + private val prefix = "profile" + + val IS_CLICKED = + new Binary(s"${prefix}.engagement.is_clicked", Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_DWELLED = + new Binary(s"${prefix}.engagement.is_dwelled", Set(TweetsViewed, EngagementsPrivate).asJava) + val IS_FAVORITED = new Binary( + s"${prefix}.engagement.is_favorited", + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED = new Binary( + s"${prefix}.engagement.is_replied", + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_RETWEETED = new Binary( + s"${prefix}.engagement.is_retweeted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + + // Negative engagements + val IS_DONT_LIKE = + new Binary(s"${prefix}.engagement.is_dont_like", Set(EngagementsPrivate).asJava) + val IS_BLOCK_CLICKED = new Binary( + s"${prefix}.engagement.is_block_clicked", + Set(Blocks, TweetsClicked, EngagementsPrivate, EngagementsPublic).asJava) + val IS_MUTE_CLICKED = new Binary( + s"${prefix}.engagement.is_mute_clicked", + Set(Mutes, TweetsClicked, EngagementsPrivate).asJava) + val IS_REPORT_TWEET_CLICKED = new Binary( + s"${prefix}.engagement.is_report_tweet_clicked", + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_NEGATIVE_FEEDBACK_UNION = new Binary( + s"${prefix}.engagement.is_negative_feedback_union", + Set(EngagementsPrivate, Blocks, Mutes, TweetsClicked, EngagementsPublic).asJava) + + val CoreEngagements: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_DWELLED, + IS_FAVORITED, + IS_REPLIED, + IS_RETWEETED + ) + + val NegativeEngagements: Set[Feature[JBoolean]] = Set( + IS_DONT_LIKE, + IS_BLOCK_CLICKED, + IS_MUTE_CLICKED, + IS_REPORT_TWEET_CLICKED + ) + +} + +object SearchLabelFeatures { + private val prefix = "search" + + val IS_CLICKED = + new Binary(s"${prefix}.engagement.is_clicked", Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_DWELLED = + new Binary(s"${prefix}.engagement.is_dwelled", Set(TweetsViewed, EngagementsPrivate).asJava) + val IS_FAVORITED = new Binary( + s"${prefix}.engagement.is_favorited", + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED = new Binary( + s"${prefix}.engagement.is_replied", + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_RETWEETED = new Binary( + s"${prefix}.engagement.is_retweeted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_PROFILE_CLICKED_SEARCH_RESULT_USER = new Binary( + s"${prefix}.engagement.is_profile_clicked_search_result_user", + Set(ProfilesClicked, ProfilesViewed, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_SEARCH_RESULT_TWEET = new Binary( + s"${prefix}.engagement.is_profile_clicked_search_result_tweet", + Set(ProfilesClicked, ProfilesViewed, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_TYPEAHEAD_USER = new Binary( + s"${prefix}.engagement.is_profile_clicked_typeahead_user", + Set(ProfilesClicked, ProfilesViewed, EngagementsPrivate).asJava) + + val CoreEngagements: Set[Feature[JBoolean]] = Set( + IS_CLICKED, + IS_DWELLED, + IS_FAVORITED, + IS_REPLIED, + IS_RETWEETED, + IS_PROFILE_CLICKED_SEARCH_RESULT_USER, + IS_PROFILE_CLICKED_SEARCH_RESULT_TWEET, + IS_PROFILE_CLICKED_TYPEAHEAD_USER + ) +} +// Add Tweet Detail labels later diff --git a/src/scala/com/twitter/timelines/prediction/features/common/TimelinesSharedFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/common/TimelinesSharedFeatures.scala new file mode 100644 index 000000000..99698530f --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/common/TimelinesSharedFeatures.scala @@ -0,0 +1,759 @@ +package com.twitter.timelines.prediction.features.common + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.Discrete +import com.twitter.ml.api.Feature.SparseBinary +import com.twitter.ml.api.Feature.SparseContinuous +import com.twitter.ml.api.Feature.Text +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import scala.collection.JavaConverters._ + +object TimelinesSharedFeatures extends TimelinesSharedFeatures("") +object InReplyToTweetTimelinesSharedFeatures extends TimelinesSharedFeatures("in_reply_to_tweet") + +/** + * Defines shared features + */ +class TimelinesSharedFeatures(prefix: String) { + private def name(featureName: String): String = { + if (prefix.nonEmpty) { + s"$prefix.$featureName" + } else { + featureName + } + } + + // meta + val EXPERIMENT_META = new SparseBinary( + name("timelines.meta.experiment_meta"), + Set(ExperimentId, ExperimentName).asJava) + + // historically used in the "combined models" to distinguish in-network and out of network tweets. + // now the feature denotes which adapter (recap or rectweet) was used to generate the datarecords. + // and is used by the data collection pipeline to split the training data. + val INJECTION_TYPE = new Discrete(name("timelines.meta.injection_type")) + + // Used to indicate which injection module is this + val INJECTION_MODULE_NAME = new Text(name("timelines.meta.injection_module_name")) + + val LIST_ID = new Discrete(name("timelines.meta.list_id")) + val LIST_IS_PINNED = new Binary(name("timelines.meta.list_is_pinned")) + + // internal id per each PS request. mainly to join back commomn features and candidate features later + val PREDICTION_REQUEST_ID = new Discrete(name("timelines.meta.prediction_request_id")) + // internal id per each TLM request. mainly to deduplicate re-served cached tweets in logging + val SERVED_REQUEST_ID = new Discrete(name("timelines.meta.served_request_id")) + // internal id used for join key in kafka logging, equal to servedRequestId if tweet is cached, + // else equal to predictionRequestId + val SERVED_ID = new Discrete(name("timelines.meta.served_id")) + val REQUEST_JOIN_ID = new Discrete(name("timelines.meta.request_join_id")) + + // Internal boolean flag per tweet, whether the tweet is served from RankedTweetsCache: TQ-14050 + // this feature should not be trained on, blacklisted in feature_config: D838346 + val IS_READ_FROM_CACHE = new Binary(name("timelines.meta.is_read_from_cache")) + + // model score discounts + val PHOTO_DISCOUNT = new Continuous(name("timelines.score_discounts.photo")) + val VIDEO_DISCOUNT = new Continuous(name("timelines.score_discounts.video")) + val TWEET_HEIGHT_DISCOUNT = new Continuous(name("timelines.score_discounts.tweet_height")) + val TOXICITY_DISCOUNT = new Continuous(name("timelines.score_discounts.toxicity")) + + // engagements + val ENGAGEMENT_TYPE = new Discrete(name("timelines.engagement.type")) + val PREDICTED_IS_FAVORITED = + new Continuous(name("timelines.engagement_predicted.is_favorited"), Set(EngagementScore).asJava) + val PREDICTED_IS_RETWEETED = + new Continuous(name("timelines.engagement_predicted.is_retweeted"), Set(EngagementScore).asJava) + val PREDICTED_IS_QUOTED = + new Continuous(name("timelines.engagement_predicted.is_quoted"), Set(EngagementScore).asJava) + val PREDICTED_IS_REPLIED = + new Continuous(name("timelines.engagement_predicted.is_replied"), Set(EngagementScore).asJava) + val PREDICTED_IS_OPEN_LINKED = new Continuous( + name("timelines.engagement_predicted.is_open_linked"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_OPEN_LINK = new Continuous( + name("timelines.engagement_predicted.is_good_open_link"), + Set(EngagementScore).asJava) + val PREDICTED_IS_PROFILE_CLICKED = new Continuous( + name("timelines.engagement_predicted.is_profile_clicked"), + Set(EngagementScore).asJava + ) + val PREDICTED_IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED = new Continuous( + name("timelines.engagement_predicted.is_profile_clicked_and_profile_engaged"), + Set(EngagementScore).asJava + ) + val PREDICTED_IS_CLICKED = + new Continuous(name("timelines.engagement_predicted.is_clicked"), Set(EngagementScore).asJava) + val PREDICTED_IS_PHOTO_EXPANDED = new Continuous( + name("timelines.engagement_predicted.is_photo_expanded"), + Set(EngagementScore).asJava + ) + val PREDICTED_IS_FOLLOWED = + new Continuous(name("timelines.engagement_predicted.is_followed"), Set(EngagementScore).asJava) + val PREDICTED_IS_DONT_LIKE = + new Continuous(name("timelines.engagement_predicted.is_dont_like"), Set(EngagementScore).asJava) + val PREDICTED_IS_VIDEO_PLAYBACK_50 = new Continuous( + name("timelines.engagement_predicted.is_video_playback_50"), + Set(EngagementScore).asJava + ) + val PREDICTED_IS_VIDEO_QUALITY_VIEWED = new Continuous( + name("timelines.engagement_predicted.is_video_quality_viewed"), + Set(EngagementScore).asJava + ) + val PREDICTED_IS_GOOD_CLICKED_V1 = new Continuous( + name("timelines.engagement_predicted.is_good_clicked_convo_desc_favorited_or_replied"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_CLICKED_V2 = new Continuous( + name("timelines.engagement_predicted.is_good_clicked_convo_desc_v2"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_8_SEC = new Continuous( + name("timelines.engagement_predicted.is_tweet_detail_dwelled_8_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_15_SEC = new Continuous( + name("timelines.engagement_predicted.is_tweet_detail_dwelled_15_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_25_SEC = new Continuous( + name("timelines.engagement_predicted.is_tweet_detail_dwelled_25_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_30_SEC = new Continuous( + name("timelines.engagement_predicted.is_tweet_detail_dwelled_30_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_CLICKED_WITH_DWELL_SUM_GTE_60S = new Continuous( + name( + "timelines.engagement_predicted.is_good_clicked_convo_desc_favorited_or_replied_or_dwell_sum_gte_60_secs"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FAVORITED_FAV_ENGAGED_BY_AUTHOR = new Continuous( + name("timelines.engagement_predicted.is_favorited_fav_engaged_by_author"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_REPORT_TWEET_CLICKED = + new Continuous( + name("timelines.engagement_predicted.is_report_tweet_clicked"), + Set(EngagementScore).asJava) + val PREDICTED_IS_NEGATIVE_FEEDBACK = new Continuous( + name("timelines.engagement_predicted.is_negative_feedback"), + Set(EngagementScore).asJava) + val PREDICTED_IS_NEGATIVE_FEEDBACK_V2 = new Continuous( + name("timelines.engagement_predicted.is_negative_feedback_v2"), + Set(EngagementScore).asJava) + val PREDICTED_IS_WEAK_NEGATIVE_FEEDBACK = new Continuous( + name("timelines.engagement_predicted.is_weak_negative_feedback"), + Set(EngagementScore).asJava) + val PREDICTED_IS_STRONG_NEGATIVE_FEEDBACK = new Continuous( + name("timelines.engagement_predicted.is_strong_negative_feedback"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_DWELLED_IN_BOUNDS_V1 = new Continuous( + name("timelines.engagement_predicted.is_dwelled_in_bounds_v1"), + Set(EngagementScore).asJava) + val PREDICTED_DWELL_NORMALIZED_OVERALL = new Continuous( + name("timelines.engagement_predicted.dwell_normalized_overall"), + Set(EngagementScore).asJava) + val PREDICTED_DWELL_CDF = + new Continuous(name("timelines.engagement_predicted.dwell_cdf"), Set(EngagementScore).asJava) + val PREDICTED_DWELL_CDF_OVERALL = new Continuous( + name("timelines.engagement_predicted.dwell_cdf_overall"), + Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED = + new Continuous(name("timelines.engagement_predicted.is_dwelled"), Set(EngagementScore).asJava) + + val PREDICTED_IS_HOME_LATEST_VISITED = new Continuous( + name("timelines.engagement_predicted.is_home_latest_visited"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_BOOKMARKED = new Continuous( + name("timelines.engagement_predicted.is_bookmarked"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_SHARED = + new Continuous(name("timelines.engagement_predicted.is_shared"), Set(EngagementScore).asJava) + val PREDICTED_IS_SHARE_MENU_CLICKED = new Continuous( + name("timelines.engagement_predicted.is_share_menu_clicked"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_PROFILE_DWELLED_20_SEC = new Continuous( + name("timelines.engagement_predicted.is_profile_dwelled_20_sec"), + Set(EngagementScore).asJava) + + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_5_SEC = new Continuous( + name("timelines.engagement_predicted.is_fullscreen_video_dwelled_5_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_10_SEC = new Continuous( + name("timelines.engagement_predicted.is_fullscreen_video_dwelled_10_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_20_SEC = new Continuous( + name("timelines.engagement_predicted.is_fullscreen_video_dwelled_20_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_30_SEC = new Continuous( + name("timelines.engagement_predicted.is_fullscreen_video_dwelled_30_sec"), + Set(EngagementScore).asJava) + + // Please use this timestamp, not the `meta.timestamp`, for the actual served timestamp. + val SERVED_TIMESTAMP = + new Discrete("timelines.meta.timestamp.served", Set(PrivateTimestamp).asJava) + + // timestamp when the engagement has occurred. do not train on these features + val TIMESTAMP_FAVORITED = + new Discrete("timelines.meta.timestamp.engagement.favorited", Set(PublicTimestamp).asJava) + val TIMESTAMP_RETWEETED = + new Discrete("timelines.meta.timestamp.engagement.retweeted", Set(PublicTimestamp).asJava) + val TIMESTAMP_REPLIED = + new Discrete("timelines.meta.timestamp.engagement.replied", Set(PublicTimestamp).asJava) + val TIMESTAMP_PROFILE_CLICKED = new Discrete( + "timelines.meta.timestamp.engagement.profile_clicked", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_CLICKED = + new Discrete("timelines.meta.timestamp.engagement.clicked", Set(PrivateTimestamp).asJava) + val TIMESTAMP_PHOTO_EXPANDED = + new Discrete("timelines.meta.timestamp.engagement.photo_expanded", Set(PrivateTimestamp).asJava) + val TIMESTAMP_DWELLED = + new Discrete("timelines.meta.timestamp.engagement.dwelled", Set(PrivateTimestamp).asJava) + val TIMESTAMP_VIDEO_PLAYBACK_50 = new Discrete( + "timelines.meta.timestamp.engagement.video_playback_50", + Set(PrivateTimestamp).asJava) + // reply engaged by author + val TIMESTAMP_REPLY_FAVORITED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.reply_favorited_by_author", + Set(PublicTimestamp).asJava) + val TIMESTAMP_REPLY_REPLIED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.reply_replied_by_author", + Set(PublicTimestamp).asJava) + val TIMESTAMP_REPLY_RETWEETED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.reply_retweeted_by_author", + Set(PublicTimestamp).asJava) + // fav engaged by author + val TIMESTAMP_FAV_FAVORITED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.fav_favorited_by_author", + Set(PublicTimestamp).asJava) + val TIMESTAMP_FAV_REPLIED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.fav_replied_by_author", + Set(PublicTimestamp).asJava) + val TIMESTAMP_FAV_RETWEETED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.fav_retweeted_by_author", + Set(PublicTimestamp).asJava) + val TIMESTAMP_FAV_FOLLOWED_BY_AUTHOR = new Discrete( + "timelines.meta.timestamp.engagement.fav_followed_by_author", + Set(PublicTimestamp).asJava) + // good click + val TIMESTAMP_GOOD_CLICK_CONVO_DESC_FAVORITED = new Discrete( + "timelines.meta.timestamp.engagement.good_click_convo_desc_favorited", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_GOOD_CLICK_CONVO_DESC_REPLIIED = new Discrete( + "timelines.meta.timestamp.engagement.good_click_convo_desc_replied", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_GOOD_CLICK_CONVO_DESC_PROFILE_CLICKED = new Discrete( + "timelines.meta.timestamp.engagement.good_click_convo_desc_profiile_clicked", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_NEGATIVE_FEEDBACK = new Discrete( + "timelines.meta.timestamp.engagement.negative_feedback", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_REPORT_TWEET_CLICK = + new Discrete( + "timelines.meta.timestamp.engagement.report_tweet_click", + Set(PrivateTimestamp).asJava) + val TIMESTAMP_IMPRESSED = + new Discrete("timelines.meta.timestamp.engagement.impressed", Set(PublicTimestamp).asJava) + val TIMESTAMP_TWEET_DETAIL_DWELLED = + new Discrete( + "timelines.meta.timestamp.engagement.tweet_detail_dwelled", + Set(PublicTimestamp).asJava) + val TIMESTAMP_PROFILE_DWELLED = + new Discrete("timelines.meta.timestamp.engagement.profile_dwelled", Set(PublicTimestamp).asJava) + val TIMESTAMP_FULLSCREEN_VIDEO_DWELLED = + new Discrete( + "timelines.meta.timestamp.engagement.fullscreen_video_dwelled", + Set(PublicTimestamp).asJava) + val TIMESTAMP_LINK_DWELLED = + new Discrete("timelines.meta.timestamp.engagement.link_dwelled", Set(PublicTimestamp).asJava) + + // these are used to dup and split the negative instances during streaming processing (kafka) + val TRAINING_FOR_FAVORITED = + new Binary("timelines.meta.training_data.for_favorited", Set(EngagementId).asJava) + val TRAINING_FOR_RETWEETED = + new Binary("timelines.meta.training_data.for_retweeted", Set(EngagementId).asJava) + val TRAINING_FOR_REPLIED = + new Binary("timelines.meta.training_data.for_replied", Set(EngagementId).asJava) + val TRAINING_FOR_PROFILE_CLICKED = + new Binary("timelines.meta.training_data.for_profile_clicked", Set(EngagementId).asJava) + val TRAINING_FOR_CLICKED = + new Binary("timelines.meta.training_data.for_clicked", Set(EngagementId).asJava) + val TRAINING_FOR_PHOTO_EXPANDED = + new Binary("timelines.meta.training_data.for_photo_expanded", Set(EngagementId).asJava) + val TRAINING_FOR_VIDEO_PLAYBACK_50 = + new Binary("timelines.meta.training_data.for_video_playback_50", Set(EngagementId).asJava) + val TRAINING_FOR_NEGATIVE_FEEDBACK = + new Binary("timelines.meta.training_data.for_negative_feedback", Set(EngagementId).asJava) + val TRAINING_FOR_REPORTED = + new Binary("timelines.meta.training_data.for_reported", Set(EngagementId).asJava) + val TRAINING_FOR_DWELLED = + new Binary("timelines.meta.training_data.for_dwelled", Set(EngagementId).asJava) + val TRAINING_FOR_SHARED = + new Binary("timelines.meta.training_data.for_shared", Set(EngagementId).asJava) + val TRAINING_FOR_SHARE_MENU_CLICKED = + new Binary("timelines.meta.training_data.for_share_menu_clicked", Set(EngagementId).asJava) + + // Warning: do not train on these features + val PREDICTED_SCORE = new Continuous(name("timelines.score"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_FAV = new Continuous(name("timelines.score.fav"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_RETWEET = + new Continuous(name("timelines.score.retweet"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_REPLY = + new Continuous(name("timelines.score.reply"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_OPEN_LINK = + new Continuous(name("timelines.score.open_link"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_GOOD_OPEN_LINK = + new Continuous(name("timelines.score.good_open_link"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_PROFILE_CLICK = + new Continuous(name("timelines.score.profile_click"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DETAIL_EXPAND = + new Continuous(name("timelines.score.detail_expand"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_PHOTO_EXPAND = + new Continuous(name("timelines.score.photo_expand"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_PLAYBACK_50 = + new Continuous(name("timelines.score.playback_50"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_VIDEO_QUALITY_VIEW = + new Continuous(name("timelines.score.video_quality_view"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DONT_LIKE = + new Continuous(name("timelines.score.dont_like"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_PROFILE_CLICKED_AND_PROFILE_ENGAGED = + new Continuous( + name("timelines.score.profile_clicked_and_profile_engaged"), + Set(EngagementScore).asJava) + val PREDICTED_SCORE_GOOD_CLICKED_V1 = + new Continuous(name("timelines.score.good_clicked_v1"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_GOOD_CLICKED_V2 = + new Continuous(name("timelines.score.good_clicked_v2"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DWELL = + new Continuous(name("timelines.score.dwell"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DWELL_CDF = + new Continuous(name("timelines.score.dwell_cfd"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DWELL_CDF_OVERALL = + new Continuous(name("timelines.score.dwell_cfd_overall"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_DWELL_NORMALIZED_OVERALL = + new Continuous(name("timelines.score.dwell_normalized_overall"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_NEGATIVE_FEEDBACK = + new Continuous(name("timelines.score.negative_feedback"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_NEGATIVE_FEEDBACK_V2 = + new Continuous(name("timelines.score.negative_feedback_v2"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_WEAK_NEGATIVE_FEEDBACK = + new Continuous(name("timelines.score.weak_negative_feedback"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_STRONG_NEGATIVE_FEEDBACK = + new Continuous(name("timelines.score.strong_negative_feedback"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_REPORT_TWEET_CLICKED = + new Continuous(name("timelines.score.report_tweet_clicked"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_UNFOLLOW_TOPIC = + new Continuous(name("timelines.score.unfollow_topic"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_FOLLOW = + new Continuous(name("timelines.score.follow"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_RELEVANCE_PROMPT_YES_CLICKED = + new Continuous( + name("timelines.score.relevance_prompt_yes_clicked"), + Set(EngagementScore).asJava) + val PREDICTED_SCORE_BOOKMARK = + new Continuous(name("timelines.score.bookmark"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_SHARE = + new Continuous(name("timelines.score.share"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_SHARE_MENU_CLICK = + new Continuous(name("timelines.score.share_menu_click"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_PROFILE_DWELLED = + new Continuous(name("timelines.score.good_profile_dwelled"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_TWEET_DETAIL_DWELLED = + new Continuous(name("timelines.score.tweet_detail_dwelled"), Set(EngagementScore).asJava) + val PREDICTED_SCORE_FULLSCREEN_VIDEO_DWELL = + new Continuous(name("timelines.score.fullscreen_video_dwell"), Set(EngagementScore).asJava) + + // hydrated in TimelinesSharedFeaturesAdapter that recap adapter calls + val ORIGINAL_AUTHOR_ID = new Discrete(name("entities.original_author_id"), Set(UserId).asJava) + val SOURCE_AUTHOR_ID = new Discrete(name("entities.source_author_id"), Set(UserId).asJava) + val SOURCE_TWEET_ID = new Discrete(name("entities.source_tweet_id"), Set(TweetId).asJava) + val TOPIC_ID = new Discrete(name("entities.topic_id"), Set(SemanticcoreClassification).asJava) + val INFERRED_TOPIC_IDS = + new SparseBinary(name("entities.inferred_topic_ids"), Set(SemanticcoreClassification).asJava) + val INFERRED_TOPIC_ID = TypedAggregateGroup.sparseFeature(INFERRED_TOPIC_IDS) + + val WEIGHTED_FAV_COUNT = new Continuous( + name("timelines.earlybird.weighted_fav_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val WEIGHTED_RETWEET_COUNT = new Continuous( + name("timelines.earlybird.weighted_retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val WEIGHTED_REPLY_COUNT = new Continuous( + name("timelines.earlybird.weighted_reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val WEIGHTED_QUOTE_COUNT = new Continuous( + name("timelines.earlybird.weighted_quote_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val EMBEDS_IMPRESSION_COUNT_V2 = new Continuous( + name("timelines.earlybird.embeds_impression_count_v2"), + Set(CountOfImpression).asJava) + val EMBEDS_URL_COUNT_V2 = new Continuous( + name("timelines.earlybird.embeds_url_count_v2"), + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val DECAYED_FAVORITE_COUNT = new Continuous( + name("timelines.earlybird.decayed_favorite_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val DECAYED_RETWEET_COUNT = new Continuous( + name("timelines.earlybird.decayed_retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val DECAYED_REPLY_COUNT = new Continuous( + name("timelines.earlybird.decayed_reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val DECAYED_QUOTE_COUNT = new Continuous( + name("timelines.earlybird.decayed_quote_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val FAKE_FAVORITE_COUNT = new Continuous( + name("timelines.earlybird.fake_favorite_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val FAKE_RETWEET_COUNT = new Continuous( + name("timelines.earlybird.fake_retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val FAKE_REPLY_COUNT = new Continuous( + name("timelines.earlybird.fake_reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val FAKE_QUOTE_COUNT = new Continuous( + name("timelines.earlybird.fake_quote_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val QUOTE_COUNT = new Continuous( + name("timelines.earlybird.quote_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + + // Safety features + val LABEL_ABUSIVE_FLAG = + new Binary(name("timelines.earlybird.label_abusive_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_ABUSIVE_HI_RCL_FLAG = + new Binary(name("timelines.earlybird.label_abusive_hi_rcl_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_DUP_CONTENT_FLAG = + new Binary(name("timelines.earlybird.label_dup_content_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_NSFW_HI_PRC_FLAG = + new Binary(name("timelines.earlybird.label_nsfw_hi_prc_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_NSFW_HI_RCL_FLAG = + new Binary(name("timelines.earlybird.label_nsfw_hi_rcl_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_SPAM_FLAG = + new Binary(name("timelines.earlybird.label_spam_flag"), Set(TweetSafetyLabels).asJava) + val LABEL_SPAM_HI_RCL_FLAG = + new Binary(name("timelines.earlybird.label_spam_hi_rcl_flag"), Set(TweetSafetyLabels).asJava) + + // Periscope features + val PERISCOPE_EXISTS = new Binary( + name("timelines.earlybird.periscope_exists"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val PERISCOPE_IS_LIVE = new Binary( + name("timelines.earlybird.periscope_is_live"), + Set(PrivateBroadcastMetrics, PublicBroadcastMetrics).asJava) + val PERISCOPE_HAS_BEEN_FEATURED = new Binary( + name("timelines.earlybird.periscope_has_been_featured"), + Set(PrivateBroadcastMetrics, PublicBroadcastMetrics).asJava) + val PERISCOPE_IS_CURRENTLY_FEATURED = new Binary( + name("timelines.earlybird.periscope_is_currently_featured"), + Set(PrivateBroadcastMetrics, PublicBroadcastMetrics).asJava + ) + val PERISCOPE_IS_FROM_QUALITY_SOURCE = new Binary( + name("timelines.earlybird.periscope_is_from_quality_source"), + Set(PrivateBroadcastMetrics, PublicBroadcastMetrics).asJava + ) + + val VISIBLE_TOKEN_RATIO = new Continuous(name("timelines.earlybird.visible_token_ratio")) + val HAS_QUOTE = new Binary( + name("timelines.earlybird.has_quote"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val IS_COMPOSER_SOURCE_CAMERA = new Binary( + name("timelines.earlybird.is_composer_source_camera"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + + val EARLYBIRD_SCORE = new Continuous( + name("timelines.earlybird_score"), + Set(EngagementScore).asJava + ) // separating from the rest of "timelines.earlybird." namespace + + val DWELL_TIME_MS = new Continuous( + name("timelines.engagement.dwell_time_ms"), + Set(EngagementDurationAndTimestamp, ImpressionMetadata, PrivateTimestamp).asJava) + + val TWEET_DETAIL_DWELL_TIME_MS = new Continuous( + name("timelines.engagement.tweet_detail_dwell_time_ms"), + Set(EngagementDurationAndTimestamp, ImpressionMetadata, PrivateTimestamp).asJava) + + val PROFILE_DWELL_TIME_MS = new Continuous( + name("timelines.engagement.profile_dwell_time_ms"), + Set(EngagementDurationAndTimestamp, ImpressionMetadata, PrivateTimestamp).asJava) + + val FULLSCREEN_VIDEO_DWELL_TIME_MS = new Continuous( + name("timelines.engagement.fullscreen_video_dwell_time_ms"), + Set(EngagementDurationAndTimestamp, ImpressionMetadata, PrivateTimestamp).asJava) + + val LINK_DWELL_TIME_MS = new Continuous( + name("timelines.engagement.link_dwell_time_ms"), + Set(EngagementDurationAndTimestamp, ImpressionMetadata, PrivateTimestamp).asJava) + + val ASPECT_RATIO_DEN = new Continuous( + name("tweetsource.tweet.media.aspect_ratio_den"), + Set(MediaFile, MediaProcessingInformation).asJava) + val ASPECT_RATIO_NUM = new Continuous( + name("tweetsource.tweet.media.aspect_ratio_num"), + Set(MediaFile, MediaProcessingInformation).asJava) + val BIT_RATE = new Continuous( + name("tweetsource.tweet.media.bit_rate"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HEIGHT_2 = new Continuous( + name("tweetsource.tweet.media.height_2"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HEIGHT_1 = new Continuous( + name("tweetsource.tweet.media.height_1"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HEIGHT_3 = new Continuous( + name("tweetsource.tweet.media.height_3"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HEIGHT_4 = new Continuous( + name("tweetsource.tweet.media.height_4"), + Set(MediaFile, MediaProcessingInformation).asJava) + val RESIZE_METHOD_1 = new Discrete( + name("tweetsource.tweet.media.resize_method_1"), + Set(MediaFile, MediaProcessingInformation).asJava) + val RESIZE_METHOD_2 = new Discrete( + name("tweetsource.tweet.media.resize_method_2"), + Set(MediaFile, MediaProcessingInformation).asJava) + val RESIZE_METHOD_3 = new Discrete( + name("tweetsource.tweet.media.resize_method_3"), + Set(MediaFile, MediaProcessingInformation).asJava) + val RESIZE_METHOD_4 = new Discrete( + name("tweetsource.tweet.media.resize_method_4"), + Set(MediaFile, MediaProcessingInformation).asJava) + val VIDEO_DURATION = new Continuous( + name("tweetsource.tweet.media.video_duration"), + Set(MediaFile, MediaProcessingInformation).asJava) + val WIDTH_1 = new Continuous( + name("tweetsource.tweet.media.width_1"), + Set(MediaFile, MediaProcessingInformation).asJava) + val WIDTH_2 = new Continuous( + name("tweetsource.tweet.media.width_2"), + Set(MediaFile, MediaProcessingInformation).asJava) + val WIDTH_3 = new Continuous( + name("tweetsource.tweet.media.width_3"), + Set(MediaFile, MediaProcessingInformation).asJava) + val WIDTH_4 = new Continuous( + name("tweetsource.tweet.media.width_4"), + Set(MediaFile, MediaProcessingInformation).asJava) + val NUM_MEDIA_TAGS = new Continuous( + name("tweetsource.tweet.media.num_tags"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val MEDIA_TAG_SCREEN_NAMES = new SparseBinary( + name("tweetsource.tweet.media.tag_screen_names"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val STICKER_IDS = new SparseBinary( + name("tweetsource.tweet.media.sticker_ids"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + + val NUM_COLOR_PALLETTE_ITEMS = new Continuous( + name("tweetsource.v2.tweet.media.num_color_pallette_items"), + Set(MediaFile, MediaProcessingInformation).asJava) + val COLOR_1_RED = new Continuous( + name("tweetsource.v2.tweet.media.color_1_red"), + Set(MediaFile, MediaProcessingInformation).asJava) + val COLOR_1_BLUE = new Continuous( + name("tweetsource.v2.tweet.media.color_1_blue"), + Set(MediaFile, MediaProcessingInformation).asJava) + val COLOR_1_GREEN = new Continuous( + name("tweetsource.v2.tweet.media.color_1_green"), + Set(MediaFile, MediaProcessingInformation).asJava) + val COLOR_1_PERCENTAGE = new Continuous( + name("tweetsource.v2.tweet.media.color_1_percentage"), + Set(MediaFile, MediaProcessingInformation).asJava) + val MEDIA_PROVIDERS = new SparseBinary( + name("tweetsource.v2.tweet.media.providers"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val IS_360 = new Binary( + name("tweetsource.v2.tweet.media.is_360"), + Set(MediaFile, MediaProcessingInformation).asJava) + val VIEW_COUNT = + new Continuous(name("tweetsource.v2.tweet.media.view_count"), Set(MediaContentMetrics).asJava) + val IS_MANAGED = new Binary( + name("tweetsource.v2.tweet.media.is_managed"), + Set(MediaFile, MediaProcessingInformation).asJava) + val IS_MONETIZABLE = new Binary( + name("tweetsource.v2.tweet.media.is_monetizable"), + Set(MediaFile, MediaProcessingInformation).asJava) + val IS_EMBEDDABLE = new Binary( + name("tweetsource.v2.tweet.media.is_embeddable"), + Set(MediaFile, MediaProcessingInformation).asJava) + val CLASSIFICATION_LABELS = new SparseContinuous( + name("tweetsource.v2.tweet.media.classification_labels"), + Set(MediaFile, MediaProcessingInformation).asJava) + + val NUM_STICKERS = new Continuous( + name("tweetsource.v2.tweet.media.num_stickers"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val NUM_FACES = new Continuous( + name("tweetsource.v2.tweet.media.num_faces"), + Set(MediaFile, MediaProcessingInformation).asJava) + val FACE_AREAS = new Continuous( + name("tweetsource.v2.tweet.media.face_areas"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_SELECTED_PREVIEW_IMAGE = new Binary( + name("tweetsource.v2.tweet.media.has_selected_preview_image"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_TITLE = new Binary( + name("tweetsource.v2.tweet.media.has_title"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_DESCRIPTION = new Binary( + name("tweetsource.v2.tweet.media.has_description"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_VISIT_SITE_CALL_TO_ACTION = new Binary( + name("tweetsource.v2.tweet.media.has_visit_site_call_to_action"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_APP_INSTALL_CALL_TO_ACTION = new Binary( + name("tweetsource.v2.tweet.media.has_app_install_call_to_action"), + Set(MediaFile, MediaProcessingInformation).asJava) + val HAS_WATCH_NOW_CALL_TO_ACTION = new Binary( + name("tweetsource.v2.tweet.media.has_watch_now_call_to_action"), + Set(MediaFile, MediaProcessingInformation).asJava) + + val NUM_CAPS = + new Continuous(name("tweetsource.tweet.text.num_caps"), Set(PublicTweets, PrivateTweets).asJava) + val TWEET_LENGTH = + new Continuous(name("tweetsource.tweet.text.length"), Set(PublicTweets, PrivateTweets).asJava) + val TWEET_LENGTH_TYPE = new Discrete( + name("tweetsource.tweet.text.length_type"), + Set(PublicTweets, PrivateTweets).asJava) + val NUM_WHITESPACES = new Continuous( + name("tweetsource.tweet.text.num_whitespaces"), + Set(PublicTweets, PrivateTweets).asJava) + val HAS_QUESTION = + new Binary(name("tweetsource.tweet.text.has_question"), Set(PublicTweets, PrivateTweets).asJava) + val NUM_NEWLINES = new Continuous( + name("tweetsource.tweet.text.num_newlines"), + Set(PublicTweets, PrivateTweets).asJava) + val EMOJI_TOKENS = new SparseBinary( + name("tweetsource.v3.tweet.text.emoji_tokens"), + Set(PublicTweets, PrivateTweets).asJava) + val EMOTICON_TOKENS = new SparseBinary( + name("tweetsource.v3.tweet.text.emoticon_tokens"), + Set(PublicTweets, PrivateTweets).asJava) + val NUM_EMOJIS = new Continuous( + name("tweetsource.v3.tweet.text.num_emojis"), + Set(PublicTweets, PrivateTweets).asJava) + val NUM_EMOTICONS = new Continuous( + name("tweetsource.v3.tweet.text.num_emoticons"), + Set(PublicTweets, PrivateTweets).asJava) + val POS_UNIGRAMS = new SparseBinary( + name("tweetsource.v3.tweet.text.pos_unigrams"), + Set(PublicTweets, PrivateTweets).asJava) + val POS_BIGRAMS = new SparseBinary( + name("tweetsource.v3.tweet.text.pos_bigrams"), + Set(PublicTweets, PrivateTweets).asJava) + val TEXT_TOKENS = new SparseBinary( + name("tweetsource.v4.tweet.text.tokens"), + Set(PublicTweets, PrivateTweets).asJava) + + // Health features model scores (see go/toxicity, go/pblock, go/pspammytweet) + val PBLOCK_SCORE = + new Continuous(name("timelines.earlybird.pblock_score"), Set(TweetSafetyScores).asJava) + val TOXICITY_SCORE = + new Continuous(name("timelines.earlybird.toxicity_score"), Set(TweetSafetyScores).asJava) + val EXPERIMENTAL_HEALTH_MODEL_SCORE_1 = + new Continuous( + name("timelines.earlybird.experimental_health_model_score_1"), + Set(TweetSafetyScores).asJava) + val EXPERIMENTAL_HEALTH_MODEL_SCORE_2 = + new Continuous( + name("timelines.earlybird.experimental_health_model_score_2"), + Set(TweetSafetyScores).asJava) + val EXPERIMENTAL_HEALTH_MODEL_SCORE_3 = + new Continuous( + name("timelines.earlybird.experimental_health_model_score_3"), + Set(TweetSafetyScores).asJava) + val EXPERIMENTAL_HEALTH_MODEL_SCORE_4 = + new Continuous( + name("timelines.earlybird.experimental_health_model_score_4"), + Set(TweetSafetyScores).asJava) + val PSPAMMY_TWEET_SCORE = + new Continuous(name("timelines.earlybird.pspammy_tweet_score"), Set(TweetSafetyScores).asJava) + val PREPORTED_TWEET_SCORE = + new Continuous(name("timelines.earlybird.preported_tweet_score"), Set(TweetSafetyScores).asJava) + + // where record was displayed e.g. recap vs ranked timeline vs recycled + // (do NOT use for training in prediction, since this is set post-scoring) + // This differs from TimelinesSharedFeatures.INJECTION_TYPE, which is only + // set to Recap or Rectweet, and is available pre-scoring. + // This also differs from TimeFeatures.IS_TWEET_RECYCLED, which is set + // pre-scoring and indicates if a tweet is being considered for recycling. + // In contrast, DISPLAY_SUGGEST_TYPE == RecycledTweet means the tweet + // was actually served in a recycled tweet module. The two should currently + // have the same value, but need not in future, so please only use + // IS_TWEET_RECYCLED/CANDIDATE_TWEET_SOURCE_ID for training models and + // only use DISPLAY_SUGGEST_TYPE for offline analysis of tweets actually + // served in recycled modules. + val DISPLAY_SUGGEST_TYPE = new Discrete(name("recap.display.suggest_type")) + + // Candidate tweet source id - related to DISPLAY_SUGGEST_TYPE above, but this is a + // property of the candidate rather than display location so is safe to use + // in model training, unlike DISPLAY_SUGGEST_TYPE. + val CANDIDATE_TWEET_SOURCE_ID = + new Discrete(name("timelines.meta.candidate_tweet_source_id"), Set(TweetId).asJava) + + // Was at least 50% of this tweet in the user's viewport for at least 500 ms, + // OR did the user engage with the tweet publicly or privately + val IS_LINGER_IMPRESSION = + new Binary(name("timelines.engagement.is_linger_impression"), Set(EngagementsPrivate).asJava) + + // Features to create rollups + val LANGUAGE_GROUP = new Discrete(name("timelines.tweet.text.language_group")) + + // The final position index of the tweet being trained on in the timeline + // served from TLM (could still change later in TLS-API), as recorded by + // PositionIndexLoggingEnvelopeTransform. + val FINAL_POSITION_INDEX = new Discrete(name("timelines.display.final_position_index")) + + // The traceId of the timeline request, can be used to group tweets in the same response. + val TRACE_ID = new Discrete(name("timelines.display.trace_id"), Set(TfeTransactionId).asJava) + + // Whether this tweet was randomly injected into the timeline or not, for exploration purposes + val IS_RANDOM_TWEET = new Binary(name("timelines.display.is_random_tweet")) + + // Whether this tweet was reordered with softmax ranking for explore/exploit, and needs to + // be excluded from exploit only holdback + val IS_SOFTMAX_RANKING_TWEET = new Binary(name("timelines.display.is_softmax_ranking_tweet")) + + // Whether the user viewing the tweet has disabled ranked timeline. + val IS_RANKED_TIMELINE_DISABLER = new Binary( + name("timelines.user_features.is_ranked_timeline_disabler"), + Set(AnnotationValue, GeneralSettings).asJava) + + // Whether the user viewing the tweet was one of those released from DDG 4205 control + // as part of http://go/shrink-4205 process to shrink the quality features holdback. + val IS_USER_RELEASED_FROM_QUALITY_HOLDBACK = new Binary( + name("timelines.user_features.is_released_from_quality_holdback"), + Set(ExperimentId, ExperimentName).asJava) + + val INITIAL_PREDICTION_FAV = + new Continuous(name("timelines.initial_prediction.fav"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_RETWEET = + new Continuous(name("timelines.initial_prediction.retweet"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_REPLY = + new Continuous(name("timelines.initial_prediction.reply"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_OPEN_LINK = + new Continuous(name("timelines.initial_prediction.open_link"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_PROFILE_CLICK = + new Continuous(name("timelines.initial_prediction.profile_click"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_VIDEO_PLAYBACK_50 = new Continuous( + name("timelines.initial_prediction.video_playback_50"), + Set(EngagementScore).asJava) + val INITIAL_PREDICTION_DETAIL_EXPAND = + new Continuous(name("timelines.initial_prediction.detail_expand"), Set(EngagementScore).asJava) + val INITIAL_PREDICTION_PHOTO_EXPAND = + new Continuous(name("timelines.initial_prediction.photo_expand"), Set(EngagementScore).asJava) + + val VIEWER_FOLLOWS_ORIGINAL_AUTHOR = + new Binary(name("timelines.viewer_follows_original_author"), Set(Follow).asJava) + + val IS_TOP_ONE = new Binary(name("timelines.position.is_top_one")) + val IS_TOP_FIVE = + new Binary(name(featureName = "timelines.position.is_top_five")) + val IS_TOP_TEN = + new Binary(name(featureName = "timelines.position.is_top_ten")) + + val LOG_POSITION = + new Continuous(name(featureName = "timelines.position.log_10")) + +} diff --git a/src/scala/com/twitter/timelines/prediction/features/engagement_features/BUILD b/src/scala/com/twitter/timelines/prediction/features/engagement_features/BUILD new file mode 100644 index 000000000..f6caadea0 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/engagement_features/BUILD @@ -0,0 +1,12 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/timelineservice/server/suggests/features/engagement_features:thrift-scala", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/transforms", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/engagement_features/EngagementFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/engagement_features/EngagementFeatures.scala new file mode 100644 index 000000000..e65c9db20 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/engagement_features/EngagementFeatures.scala @@ -0,0 +1,246 @@ +package com.twitter.timelines.prediction.features.engagement_features + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.logging.Logger +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.SparseBinary +import com.twitter.timelines.data_processing.ml_util.transforms.OneToSomeTransform +import com.twitter.timelines.data_processing.ml_util.transforms.RichITransform +import com.twitter.timelines.data_processing.ml_util.transforms.SparseBinaryUnion +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.timelineservice.suggests.features.engagement_features.thriftscala.{ + EngagementFeatures => ThriftEngagementFeatures +} +import com.twitter.timelineservice.suggests.features.engagement_features.v1.thriftscala.{ + EngagementFeatures => ThriftEngagementFeaturesV1 +} +import scala.collection.JavaConverters._ + +object EngagementFeatures { + private[this] val logger = Logger.get(getClass.getSimpleName) + + sealed trait EngagementFeature + case object Count extends EngagementFeature + case object RealGraphWeightAverage extends EngagementFeature + case object RealGraphWeightMax extends EngagementFeature + case object RealGraphWeightMin extends EngagementFeature + case object RealGraphWeightMissing extends EngagementFeature + case object RealGraphWeightVariance extends EngagementFeature + case object UserIds extends EngagementFeature + + def fromThrift(thriftEngagementFeatures: ThriftEngagementFeatures): Option[EngagementFeatures] = { + thriftEngagementFeatures match { + case thriftEngagementFeaturesV1: ThriftEngagementFeatures.V1 => + Some( + EngagementFeatures( + favoritedBy = thriftEngagementFeaturesV1.v1.favoritedBy, + retweetedBy = thriftEngagementFeaturesV1.v1.retweetedBy, + repliedBy = thriftEngagementFeaturesV1.v1.repliedBy, + ) + ) + case _ => { + logger.error("Unexpected EngagementFeatures version found.") + None + } + } + } + + val empty: EngagementFeatures = EngagementFeatures() +} + +/** + * Contains user IDs who have engaged with a target entity, such as a Tweet, + * and any additional data needed for derived features. + */ +case class EngagementFeatures( + favoritedBy: Seq[Long] = Nil, + retweetedBy: Seq[Long] = Nil, + repliedBy: Seq[Long] = Nil, + realGraphWeightByUser: Map[Long, Double] = Map.empty) { + def isEmpty: Boolean = favoritedBy.isEmpty && retweetedBy.isEmpty && repliedBy.isEmpty + def nonEmpty: Boolean = !isEmpty + def toLogThrift: ThriftEngagementFeatures.V1 = + ThriftEngagementFeatures.V1( + ThriftEngagementFeaturesV1( + favoritedBy = favoritedBy, + retweetedBy = retweetedBy, + repliedBy = repliedBy + ) + ) +} + +/** + * Represents engagement features derived from the Real Graph weight. + * + * These features are from the perspective of the source user, who is viewing their + * timeline, to the destination users (or user), who created engagements. + * + * @param count number of engagements present + * @param max max score of the engaging users + * @param mean average score of the engaging users + * @param min minimum score of the engaging users + * @param missing for engagements present, how many Real Graph scores were missing + * @param variance variance of scores of the engaging users + */ +case class RealGraphDerivedEngagementFeatures( + count: Int, + max: Double, + mean: Double, + min: Double, + missing: Int, + variance: Double) + +object EngagementDataRecordFeatures { + import EngagementFeatures._ + + val FavoritedByUserIds = new SparseBinary( + "engagement_features.user_ids.favorited_by", + Set(UserId, PrivateLikes, PublicLikes).asJava) + val RetweetedByUserIds = new SparseBinary( + "engagement_features.user_ids.retweeted_by", + Set(UserId, PrivateRetweets, PublicRetweets).asJava) + val RepliedByUserIds = new SparseBinary( + "engagement_features.user_ids.replied_by", + Set(UserId, PrivateReplies, PublicReplies).asJava) + + val InNetworkFavoritesCount = new Continuous( + "engagement_features.in_network.favorites.count", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val InNetworkRetweetsCount = new Continuous( + "engagement_features.in_network.retweets.count", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val InNetworkRepliesCount = new Continuous( + "engagement_features.in_network.replies.count", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + + // real graph derived features + val InNetworkFavoritesAvgRealGraphWeight = new Continuous( + "engagement_features.real_graph.favorites.avg_weight", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val InNetworkFavoritesMaxRealGraphWeight = new Continuous( + "engagement_features.real_graph.favorites.max_weight", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val InNetworkFavoritesMinRealGraphWeight = new Continuous( + "engagement_features.real_graph.favorites.min_weight", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val InNetworkFavoritesRealGraphWeightMissing = new Continuous( + "engagement_features.real_graph.favorites.missing" + ) + val InNetworkFavoritesRealGraphWeightVariance = new Continuous( + "engagement_features.real_graph.favorites.weight_variance" + ) + + val InNetworkRetweetsMaxRealGraphWeight = new Continuous( + "engagement_features.real_graph.retweets.max_weight", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val InNetworkRetweetsMinRealGraphWeight = new Continuous( + "engagement_features.real_graph.retweets.min_weight", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val InNetworkRetweetsAvgRealGraphWeight = new Continuous( + "engagement_features.real_graph.retweets.avg_weight", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val InNetworkRetweetsRealGraphWeightMissing = new Continuous( + "engagement_features.real_graph.retweets.missing" + ) + val InNetworkRetweetsRealGraphWeightVariance = new Continuous( + "engagement_features.real_graph.retweets.weight_variance" + ) + + val InNetworkRepliesMaxRealGraphWeight = new Continuous( + "engagement_features.real_graph.replies.max_weight", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val InNetworkRepliesMinRealGraphWeight = new Continuous( + "engagement_features.real_graph.replies.min_weight", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val InNetworkRepliesAvgRealGraphWeight = new Continuous( + "engagement_features.real_graph.replies.avg_weight", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val InNetworkRepliesRealGraphWeightMissing = new Continuous( + "engagement_features.real_graph.replies.missing" + ) + val InNetworkRepliesRealGraphWeightVariance = new Continuous( + "engagement_features.real_graph.replies.weight_variance" + ) + + sealed trait FeatureGroup { + def continuousFeatures: Map[EngagementFeature, Continuous] + def sparseBinaryFeatures: Map[EngagementFeature, SparseBinary] + def allFeatures: Seq[Feature[_]] = + (continuousFeatures.values ++ sparseBinaryFeatures.values).toSeq + } + + case object Favorites extends FeatureGroup { + override val continuousFeatures: Map[EngagementFeature, Continuous] = + Map( + Count -> InNetworkFavoritesCount, + RealGraphWeightAverage -> InNetworkFavoritesAvgRealGraphWeight, + RealGraphWeightMax -> InNetworkFavoritesMaxRealGraphWeight, + RealGraphWeightMin -> InNetworkFavoritesMinRealGraphWeight, + RealGraphWeightMissing -> InNetworkFavoritesRealGraphWeightMissing, + RealGraphWeightVariance -> InNetworkFavoritesRealGraphWeightVariance + ) + + override val sparseBinaryFeatures: Map[EngagementFeature, SparseBinary] = + Map(UserIds -> FavoritedByUserIds) + } + + case object Retweets extends FeatureGroup { + override val continuousFeatures: Map[EngagementFeature, Continuous] = + Map( + Count -> InNetworkRetweetsCount, + RealGraphWeightAverage -> InNetworkRetweetsAvgRealGraphWeight, + RealGraphWeightMax -> InNetworkRetweetsMaxRealGraphWeight, + RealGraphWeightMin -> InNetworkRetweetsMinRealGraphWeight, + RealGraphWeightMissing -> InNetworkRetweetsRealGraphWeightMissing, + RealGraphWeightVariance -> InNetworkRetweetsRealGraphWeightVariance + ) + + override val sparseBinaryFeatures: Map[EngagementFeature, SparseBinary] = + Map(UserIds -> RetweetedByUserIds) + } + + case object Replies extends FeatureGroup { + override val continuousFeatures: Map[EngagementFeature, Continuous] = + Map( + Count -> InNetworkRepliesCount, + RealGraphWeightAverage -> InNetworkRepliesAvgRealGraphWeight, + RealGraphWeightMax -> InNetworkRepliesMaxRealGraphWeight, + RealGraphWeightMin -> InNetworkRepliesMinRealGraphWeight, + RealGraphWeightMissing -> InNetworkRepliesRealGraphWeightMissing, + RealGraphWeightVariance -> InNetworkRepliesRealGraphWeightVariance + ) + + override val sparseBinaryFeatures: Map[EngagementFeature, SparseBinary] = + Map(UserIds -> RepliedByUserIds) + } + + val PublicEngagerSets = Set(FavoritedByUserIds, RetweetedByUserIds, RepliedByUserIds) + val PublicEngagementUserIds = new SparseBinary( + "engagement_features.user_ids.public", + Set(UserId, EngagementsPublic).asJava + ) + val ENGAGER_ID = TypedAggregateGroup.sparseFeature(PublicEngagementUserIds) + + val UnifyPublicEngagersTransform = SparseBinaryUnion( + featuresToUnify = PublicEngagerSets, + outputFeature = PublicEngagementUserIds + ) + + object RichUnifyPublicEngagersTransform extends OneToSomeTransform { + override def apply(dataRecord: DataRecord): Option[DataRecord] = + RichITransform(EngagementDataRecordFeatures.UnifyPublicEngagersTransform)(dataRecord) + override def featuresToTransform: Set[Feature[_]] = + EngagementDataRecordFeatures.UnifyPublicEngagersTransform.featuresToUnify.toSet + } +} diff --git a/src/scala/com/twitter/timelines/prediction/features/escherbird/BUILD b/src/scala/com/twitter/timelines/prediction/features/escherbird/BUILD new file mode 100644 index 000000000..c28786b77 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/escherbird/BUILD @@ -0,0 +1,19 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/tweetypie:tweet-scala", + ], +) + +scala_library( + name = "escherbird-features", + sources = ["EscherbirdFeatures.scala"], + tags = ["bazel-only"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeatures.scala new file mode 100644 index 000000000..3aaf9b856 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeatures.scala @@ -0,0 +1,19 @@ +package com.twitter.timelines.prediction.features.escherbird + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature +import java.util.{Set => JSet} +import scala.collection.JavaConverters._ + +object EscherbirdFeatures { + val TweetGroupIds = new Feature.SparseBinary("escherbird.tweet_group_ids") + val TweetDomainIds = new Feature.SparseBinary("escherbird.tweet_domain_ids", Set(DomainId).asJava) + val TweetEntityIds = + new Feature.SparseBinary("escherbird.tweet_entity_ids", Set(SemanticcoreClassification).asJava) +} + +case class EscherbirdFeatures( + tweetId: Long, + tweetGroupIds: JSet[String], + tweetDomainIds: JSet[String], + tweetEntityIds: JSet[String]) diff --git a/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeaturesConverter.scala b/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeaturesConverter.scala new file mode 100644 index 000000000..bd3333a03 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/escherbird/EscherbirdFeaturesConverter.scala @@ -0,0 +1,19 @@ +package com.twitter.timelines.prediction.features.escherbird + +import com.twitter.tweetypie.thriftscala.Tweet +import scala.collection.JavaConverters._ + +object EscherbirdFeaturesConverter { + val DeprecatedOrTestDomains = Set(1L, 5L, 7L, 9L, 14L, 19L, 20L, 31L) + + def fromTweet(tweet: Tweet): Option[EscherbirdFeatures] = tweet.escherbirdEntityAnnotations.map { + escherbirdEntityAnnotations => + val annotations = escherbirdEntityAnnotations.entityAnnotations + .filterNot(annotation => DeprecatedOrTestDomains.contains(annotation.domainId)) + val tweetGroupIds = annotations.map(_.groupId.toString).toSet.asJava + val tweetDomainIds = annotations.map(_.domainId.toString).toSet.asJava + // An entity is only unique within a given domain + val tweetEntityIds = annotations.map(a => s"${a.domainId}.${a.entityId}").toSet.asJava + EscherbirdFeatures(tweet.id, tweetGroupIds, tweetDomainIds, tweetEntityIds) + } +} diff --git a/src/scala/com/twitter/timelines/prediction/features/followsource/BUILD.bazel b/src/scala/com/twitter/timelines/prediction/features/followsource/BUILD.bazel new file mode 100644 index 000000000..0ee33acdb --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/followsource/BUILD.bazel @@ -0,0 +1,7 @@ +scala_library( + sources = ["*.scala"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/followsource/FollowSourceFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/followsource/FollowSourceFeatures.scala new file mode 100644 index 000000000..012103b14 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/followsource/FollowSourceFeatures.scala @@ -0,0 +1,53 @@ +package com.twitter.timelines.prediction.features.followsource + +import com.twitter.ml.api.Feature +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import scala.collection.JavaConverters._ + +object FollowSourceFeatures { + + // Corresponds to an algorithm constant from com.twitter.hermit.profile.HermitProfileConstants + val FollowSourceAlgorithm = new Feature.Text("follow_source.algorithm") + + // Type of follow action: one of "unfollow", "follow", "follow_back", "follow_many", "follow_all" + val FollowAction = new Feature.Text( + "follow_source.action", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + + // Millisecond timestamp when follow occurred + val FollowTimestamp = + new Feature.Discrete("follow_source.follow_timestamp", Set(Follow, PrivateTimestamp).asJava) + + // Age of follow (in minutes) + val FollowAgeMinutes = + new Feature.Continuous("follow_source.follow_age_minutes", Set(Follow).asJava) + + // Tweet ID of tweet details page from where follow happened (if applicable) + val FollowCauseTweetId = new Feature.Discrete("follow_source.cause_tweet_id", Set(TweetId).asJava) + + // String representation of follow client (android, web, iphone, etc). Derived from "client" + // portion of client event namespace. + val FollowClientId = new Feature.Text("follow_source.client_id", Set(ClientType).asJava) + + // If the follow happens via a profile's Following or Followers, + // the id of the profile owner is recorded here. + val FollowAssociationId = + new Feature.Discrete("follow_source.association_id", Set(Follow, UserId).asJava) + + // The "friendly name" here is computed using FollowSourceUtil.getSource. It represents + // a grouping on a few client events that reflect where the event occurred. For example, + // events on the tweet details page are grouped using "tweetDetails": + // case (Some("web"), Some("permalink"), _, _, _) => "tweetDetails" + // case (Some("iphone"), Some("tweet"), _, _, _) => "tweetDetails" + // case (Some("android"), Some("tweet"), _, _, _) => "tweetDetails" + val FollowSourceFriendlyName = new Feature.Text("follow_source.friendly_name", Set(Follow).asJava) + + // Up to two sources and actions that preceded the follow (for example, a profile visit + // through a mention click, which itself was on a tweet detail page reached through a tweet + // click in the Home tab). See go/followsource for more details and examples. + // The "source" here is computed using FollowSourceUtil.getSource + val PreFollowAction1 = new Feature.Text("follow_source.pre_follow_action_1", Set(Follow).asJava) + val PreFollowAction2 = new Feature.Text("follow_source.pre_follow_action_2", Set(Follow).asJava) + val PreFollowSource1 = new Feature.Text("follow_source.pre_follow_source_1", Set(Follow).asJava) + val PreFollowSource2 = new Feature.Text("follow_source.pre_follow_source_2", Set(Follow).asJava) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/itl/BUILD b/src/scala/com/twitter/timelines/prediction/features/itl/BUILD new file mode 100644 index 000000000..6fc497bf3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/itl/BUILD @@ -0,0 +1,9 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/itl/ITLFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/itl/ITLFeatures.scala new file mode 100644 index 000000000..3351e5c11 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/itl/ITLFeatures.scala @@ -0,0 +1,575 @@ +package com.twitter.timelines.prediction.features.itl + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.Discrete +import com.twitter.ml.api.Feature.SparseBinary +import scala.collection.JavaConverters._ + +object ITLFeatures { + // engagement + val IS_RETWEETED = + new Binary("itl.engagement.is_retweeted", Set(PublicRetweets, PrivateRetweets).asJava) + val IS_FAVORITED = + new Binary("itl.engagement.is_favorited", Set(PublicLikes, PrivateLikes).asJava) + val IS_REPLIED = + new Binary("itl.engagement.is_replied", Set(PublicReplies, PrivateReplies).asJava) + // v1: post click engagements: fav, reply + val IS_GOOD_CLICKED_CONVO_DESC_V1 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_favorited_or_replied", + Set( + PublicLikes, + PrivateLikes, + PublicReplies, + PrivateReplies, + EngagementsPrivate, + EngagementsPublic).asJava) + // v2: post click engagements: click + val IS_GOOD_CLICKED_CONVO_DESC_V2 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_v2", + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_GOOD_CLICKED_CONVO_DESC_FAVORITED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_favorited", + Set(PublicLikes, PrivateLikes).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_REPLIED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_replied", + Set(PublicReplies, PrivateReplies).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_RETWEETED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_retweeted", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_CLICKED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_clicked", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_FOLLOWED = + new Binary("itl.engagement.is_good_clicked_convo_desc_followed", Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_SHARE_DM_CLICKED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_share_dm_clicked", + Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_PROFILE_CLICKED = new Binary( + "itl.engagement.is_good_clicked_convo_desc_profile_clicked", + Set(EngagementsPrivate).asJava) + + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_0 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_uam_gt_0", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_1 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_uam_gt_1", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_2 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_uam_gt_2", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_3 = new Binary( + "itl.engagement.is_good_clicked_convo_desc_uam_gt_3", + Set(EngagementsPrivate, EngagementsPublic).asJava) + + val IS_TWEET_DETAIL_DWELLED = new Binary( + "itl.engagement.is_tweet_detail_dwelled", + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_TWEET_DETAIL_DWELLED_8_SEC = new Binary( + "itl.engagement.is_tweet_detail_dwelled_8_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_15_SEC = new Binary( + "itl.engagement.is_tweet_detail_dwelled_15_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_25_SEC = new Binary( + "itl.engagement.is_tweet_detail_dwelled_25_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_30_SEC = new Binary( + "itl.engagement.is_tweet_detail_dwelled_30_sec", + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_PROFILE_DWELLED = new Binary( + "itl.engagement.is_profile_dwelled", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_10_SEC = new Binary( + "itl.engagement.is_profile_dwelled_10_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_20_SEC = new Binary( + "itl.engagement.is_profile_dwelled_20_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_30_SEC = new Binary( + "itl.engagement.is_profile_dwelled_30_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED = new Binary( + "itl.engagement.is_fullscreen_video_dwelled", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_5_SEC = new Binary( + "itl.engagement.is_fullscreen_video_dwelled_5_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_10_SEC = new Binary( + "itl.engagement.is_fullscreen_video_dwelled_10_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_20_SEC = new Binary( + "itl.engagement.is_fullscreen_video_dwelled_20_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_30_SEC = new Binary( + "itl.engagement.is_fullscreen_video_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_15_SEC = new Binary( + "itl.engagement.is_link_dwelled_15_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_30_SEC = new Binary( + "itl.engagement.is_link_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_60_SEC = new Binary( + "itl.engagement.is_link_dwelled_60_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_QUOTED = + new Binary("itl.engagement.is_quoted", Set(PublicRetweets, PrivateRetweets).asJava) + val IS_RETWEETED_WITHOUT_QUOTE = new Binary( + "itl.engagement.is_retweeted_without_quote", + Set(PublicRetweets, PrivateRetweets).asJava) + val IS_CLICKED = new Binary( + "itl.engagement.is_clicked", + Set(EngagementsPrivate, TweetsClicked, LinksClickedOn).asJava) + val IS_PROFILE_CLICKED = new Binary( + "itl.engagement.is_profile_clicked", + Set(EngagementsPrivate, TweetsClicked, ProfilesViewed, ProfilesClicked).asJava) + val IS_DWELLED = new Binary("itl.engagement.is_dwelled", Set(EngagementsPrivate).asJava) + val IS_DWELLED_IN_BOUNDS_V1 = + new Binary("itl.engagement.is_dwelled_in_bounds_v1", Set(EngagementsPrivate).asJava) + val DWELL_NORMALIZED_OVERALL = + new Continuous("itl.engagement.dwell_normalized_overall", Set(EngagementsPrivate).asJava) + val DWELL_CDF_OVERALL = + new Continuous("itl.engagement.dwell_cdf_overall", Set(EngagementsPrivate).asJava) + val DWELL_CDF = new Continuous("itl.engagement.dwell_cdf", Set(EngagementsPrivate).asJava) + + val IS_DWELLED_1S = new Binary("itl.engagement.is_dwelled_1s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_2S = new Binary("itl.engagement.is_dwelled_2s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_3S = new Binary("itl.engagement.is_dwelled_3s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_4S = new Binary("itl.engagement.is_dwelled_4s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_5S = new Binary("itl.engagement.is_dwelled_5s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_6S = new Binary("itl.engagement.is_dwelled_6s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_7S = new Binary("itl.engagement.is_dwelled_7s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_8S = new Binary("itl.engagement.is_dwelled_8s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_9S = new Binary("itl.engagement.is_dwelled_9s", Set(EngagementsPrivate).asJava) + val IS_DWELLED_10S = new Binary("itl.engagement.is_dwelled_10s", Set(EngagementsPrivate).asJava) + + val IS_SKIPPED_1S = new Binary("itl.engagement.is_skipped_1s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_2S = new Binary("itl.engagement.is_skipped_2s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_3S = new Binary("itl.engagement.is_skipped_3s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_4S = new Binary("itl.engagement.is_skipped_4s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_5S = new Binary("itl.engagement.is_skipped_5s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_6S = new Binary("itl.engagement.is_skipped_6s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_7S = new Binary("itl.engagement.is_skipped_7s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_8S = new Binary("itl.engagement.is_skipped_8s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_9S = new Binary("itl.engagement.is_skipped_9s", Set(EngagementsPrivate).asJava) + val IS_SKIPPED_10S = new Binary("itl.engagement.is_skipped_10s", Set(EngagementsPrivate).asJava) + + val IS_FOLLOWED = + new Binary("itl.engagement.is_followed", Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_IMPRESSED = new Binary("itl.engagement.is_impressed", Set(EngagementsPrivate).asJava) + val IS_OPEN_LINKED = + new Binary("itl.engagement.is_open_linked", Set(EngagementsPrivate, LinksClickedOn).asJava) + val IS_PHOTO_EXPANDED = new Binary( + "itl.engagement.is_photo_expanded", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_VIDEO_VIEWED = + new Binary("itl.engagement.is_video_viewed", Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_VIDEO_PLAYBACK_50 = new Binary( + "itl.engagement.is_video_playback_50", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_VIDEO_QUALITY_VIEWED = new Binary( + "itl.engagement.is_video_quality_viewed", + Set(EngagementsPrivate, EngagementsPublic).asJava + ) + val IS_BOOKMARKED = + new Binary("itl.engagement.is_bookmarked", Set(EngagementsPrivate).asJava) + val IS_SHARED = + new Binary("itl.engagement.is_shared", Set(EngagementsPrivate).asJava) + val IS_SHARE_MENU_CLICKED = + new Binary("itl.engagement.is_share_menu_clicked", Set(EngagementsPrivate).asJava) + + // Negative engagements + val IS_DONT_LIKE = + new Binary("itl.engagement.is_dont_like", Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_BLOCK_CLICKED = new Binary( + "itl.engagement.is_block_clicked", + Set(TweetsClicked, EngagementsPrivate, EngagementsPublic).asJava) + val IS_BLOCK_DIALOG_BLOCKED = new Binary( + "itl.engagement.is_block_dialog_blocked", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_MUTE_CLICKED = + new Binary("itl.engagement.is_mute_clicked", Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_MUTE_DIALOG_MUTED = + new Binary("itl.engagement.is_mute_dialog_muted", Set(EngagementsPrivate).asJava) + val IS_REPORT_TWEET_CLICKED = new Binary( + "itl.engagement.is_report_tweet_clicked", + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_CARET_CLICKED = + new Binary("itl.engagement.is_caret_clicked", Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_NOT_ABOUT_TOPIC = + new Binary("itl.engagement.is_not_about_topic", Set(EngagementsPrivate).asJava) + val IS_NOT_RECENT = + new Binary("itl.engagement.is_not_recent", Set(EngagementsPrivate).asJava) + val IS_NOT_RELEVANT = + new Binary("itl.engagement.is_not_relevant", Set(EngagementsPrivate).asJava) + val IS_SEE_FEWER = + new Binary("itl.engagement.is_see_fewer", Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC = + new Binary("itl.engagement.is_unfollow_topic", Set(EngagementsPrivate).asJava) + val IS_FOLLOW_TOPIC = + new Binary("itl.engagement.is_follow_topic", Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN_TOPIC = + new Binary("itl.engagement.is_not_interested_in_topic", Set(EngagementsPrivate).asJava) + val IS_HOME_LATEST_VISITED = + new Binary("itl.engagement.is_home_latest_visited", Set(EngagementsPrivate).asJava) + + // This derived label is the logical OR of IS_DONT_LIKE, IS_BLOCK_CLICKED, IS_MUTE_CLICKED and IS_REPORT_TWEET_CLICKED + val IS_NEGATIVE_FEEDBACK = + new Binary("itl.engagement.is_negative_feedback", Set(EngagementsPrivate).asJava) + + // Reciprocal engagements for reply forward engagement + val IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_impressed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_favorited_by_author", + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_QUOTED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_quoted_by_author", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_REPLIED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_replied_by_author", + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_RETWEETED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_retweeted_by_author", + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_BLOCKED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_blocked_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_FOLLOWED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_followed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_UNFOLLOWED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_unfollowed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_MUTED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_muted_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_REPORTED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_reported_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + + // This derived label is the logical OR of REPLY_REPLIED, REPLY_FAVORITED, REPLY_RETWEETED + val IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR = new Binary( + "itl.engagement.is_replied_reply_engaged_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava) + + // Reciprocal engagements for fav forward engagement + val IS_FAVORITED_FAV_FAVORITED_BY_AUTHOR = new Binary( + "itl.engagement.is_favorited_fav_favorited_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateLikes, PublicLikes).asJava + ) + val IS_FAVORITED_FAV_REPLIED_BY_AUTHOR = new Binary( + "itl.engagement.is_favorited_fav_replied_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateReplies, PublicReplies).asJava + ) + val IS_FAVORITED_FAV_RETWEETED_BY_AUTHOR = new Binary( + "itl.engagement.is_favorited_fav_retweeted_by_author", + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava + ) + val IS_FAVORITED_FAV_FOLLOWED_BY_AUTHOR = new Binary( + "itl.engagement.is_favorited_fav_followed_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava + ) + // This derived label is the logical OR of FAV_REPLIED, FAV_FAVORITED, FAV_RETWEETED, FAV_FOLLOWED + val IS_FAVORITED_FAV_ENGAGED_BY_AUTHOR = new Binary( + "itl.engagement.is_favorited_fav_engaged_by_author", + Set(EngagementsPrivate, EngagementsPublic).asJava + ) + + // define good profile click by considering following engagements (follow, fav, reply, retweet, etc.) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_FOLLOW = new Binary( + "itl.engagement.is_profile_clicked_and_profile_follow", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, Follow).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_FAV = new Binary( + "itl.engagement.is_profile_clicked_and_profile_fav", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateLikes, PublicLikes).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_REPLY = new Binary( + "itl.engagement.is_profile_clicked_and_profile_reply", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateReplies, PublicReplies).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_RETWEET = new Binary( + "itl.engagement.is_profile_clicked_and_profile_retweet", + Set( + ProfilesViewed, + ProfilesClicked, + EngagementsPrivate, + PrivateRetweets, + PublicRetweets).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_CLICK = new Binary( + "itl.engagement.is_profile_clicked_and_profile_tweet_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, TweetsClicked).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_SHARE_DM_CLICK = new Binary( + "itl.engagement.is_profile_clicked_and_profile_share_dm_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of all binary features above + val IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED = new Binary( + "itl.engagement.is_profile_clicked_and_profile_engaged", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, EngagementsPublic).asJava) + + // define bad profile click by considering following engagements (user report, tweet report, mute, block, etc) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_USER_REPORT_CLICK = new Binary( + "itl.engagement.is_profile_clicked_and_profile_user_report_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_REPORT_CLICK = new Binary( + "itl.engagement.is_profile_clicked_and_profile_tweet_report_click", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_MUTE = new Binary( + "itl.engagement.is_profile_clicked_and_profile_mute", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_BLOCK = new Binary( + "itl.engagement.is_profile_clicked_and_profile_block", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of bad profile click engagements and existing negative feedback + val IS_NEGATIVE_FEEDBACK_V2 = new Binary( + "itl.engagement.is_negative_feedback_v2", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // engagement for following user from any surface area + val IS_FOLLOWED_FROM_ANY_SURFACE_AREA = new Binary( + "itl.engagement.is_followed_from_any_surface_area", + Set(EngagementsPublic, EngagementsPrivate).asJava) + + // Relevance prompt tweet engagements + val IS_RELEVANCE_PROMPT_YES_CLICKED = + new Binary("itl.engagement.is_relevance_prompt_yes_clicked", Set(EngagementsPrivate).asJava) + + // Reply downvote engagements + val IS_REPLY_DOWNVOTED = + new Binary("itl.engagement.is_reply_downvoted", Set(EngagementsPrivate).asJava) + val IS_REPLY_DOWNVOTE_REMOVED = + new Binary("itl.engagement.is_reply_downvote_removed", Set(EngagementsPrivate).asJava) + + // features from RecommendedTweet + val RECTWEET_SCORE = new Continuous("itl.recommended_tweet_features.rectweet_score") + val NUM_FAVORITING_USERS = new Continuous("itl.recommended_tweet_features.num_favoriting_users") + val NUM_FOLLOWING_USERS = new Continuous("itl.recommended_tweet_features.num_following_users") + val CONTENT_SOURCE_TYPE = new Discrete("itl.recommended_tweet_features.content_source_type") + + val RECOS_SCORE = new Continuous( + "itl.recommended_tweet_features.recos_score", + Set(EngagementScore, UsersRealGraphScore, UsersSalsaScore).asJava) + val AUTHOR_REALGRAPH_SCORE = new Continuous( + "itl.recommended_tweet_features.realgraph_score", + Set(UsersRealGraphScore).asJava) + val AUTHOR_SARUS_SCORE = new Continuous( + "itl.recommended_tweet_features.sarus_score", + Set(EngagementScore, UsersSalsaScore).asJava) + + val NUM_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.num_interacting_users", + Set(EngagementScore).asJava + ) + val MAX_REALGRAPH_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.max_realgraph_score_of_interacting_users", + Set(UsersRealGraphScore, EngagementScore).asJava + ) + val SUM_REALGRAPH_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.sum_realgraph_score_of_interacting_users", + Set(UsersRealGraphScore, EngagementScore).asJava + ) + val AVG_REALGRAPH_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.avg_realgraph_score_of_interacting_users", + Set(UsersRealGraphScore, EngagementScore).asJava + ) + val MAX_SARUS_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.max_sarus_score_of_interacting_users", + Set(EngagementScore, UsersSalsaScore).asJava + ) + val SUM_SARUS_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.sum_sarus_score_of_interacting_users", + Set(EngagementScore, UsersSalsaScore).asJava + ) + val AVG_SARUS_SCORE_OF_INTERACTING_USERS = new Continuous( + "itl.recommended_tweet_features.avg_sarus_score_of_interacting_users", + Set(EngagementScore, UsersSalsaScore).asJava + ) + + val NUM_INTERACTING_FOLLOWINGS = new Continuous( + "itl.recommended_tweet_features.num_interacting_followings", + Set(EngagementScore).asJava + ) + + // features from HydratedTweetFeatures + val REAL_GRAPH_WEIGHT = + new Continuous("itl.hydrated_tweet_features.real_graph_weight", Set(UsersRealGraphScore).asJava) + val SARUS_GRAPH_WEIGHT = new Continuous("itl.hydrated_tweet_features.sarus_graph_weight") + val FROM_TOP_ENGAGED_USER = new Binary("itl.hydrated_tweet_features.from_top_engaged_user") + val FROM_TOP_INFLUENCER = new Binary("itl.hydrated_tweet_features.from_top_influencer") + val TOPIC_SIM_SEARCHER_INTERSTED_IN_AUTHOR_KNOWN_FOR = new Continuous( + "itl.hydrated_tweet_features.topic_sim_searcher_interested_in_author_known_for" + ) + val TOPIC_SIM_SEARCHER_AUTHOR_BOTH_INTERESTED_IN = new Continuous( + "itl.hydrated_tweet_features.topic_sim_searcher_author_both_interested_in" + ) + val TOPIC_SIM_SEARCHER_AUTHOR_BOTH_KNOWN_FOR = new Continuous( + "itl.hydrated_tweet_features.topic_sim_searcher_author_both_known_for" + ) + val USER_REP = new Continuous("itl.hydrated_tweet_features.user_rep") + val NORMALIZED_PARUS_SCORE = new Continuous("itl.hydrated_tweet_features.normalized_parus_score") + val CONTAINS_MEDIA = new Binary("itl.hydrated_tweet_features.contains_media") + val FROM_NEARBY = new Binary("itl.hydrated_tweet_features.from_nearby") + val TOPIC_SIM_SEARCHER_INTERESTED_IN_TWEET = new Continuous( + "itl.hydrated_tweet_features.topic_sim_searcher_interested_in_tweet" + ) + val MATCHES_UI_LANG = new Binary( + "itl.hydrated_tweet_features.matches_ui_lang", + Set(ProvidedLanguage, InferredLanguage).asJava) + val MATCHES_SEARCHER_MAIN_LANG = new Binary( + "itl.hydrated_tweet_features.matches_searcher_main_lang", + Set(ProvidedLanguage, InferredLanguage).asJava + ) + val MATCHES_SEARCHER_LANGS = new Binary( + "itl.hydrated_tweet_features.matches_searcher_langs", + Set(ProvidedLanguage, InferredLanguage).asJava) + val HAS_CARD = new Binary( + "itl.hydrated_tweet_features.has_card", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_IMAGE = new Binary( + "itl.hydrated_tweet_features.has_image", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NATIVE_IMAGE = new Binary( + "itl.hydrated_tweet_features.has_native_image", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VIDEO = new Binary("itl.hydrated_tweet_features.has_video") + val HAS_CONSUMER_VIDEO = new Binary( + "itl.hydrated_tweet_features.has_consumer_video", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_PRO_VIDEO = new Binary( + "itl.hydrated_tweet_features.has_pro_video", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_PERISCOPE = new Binary( + "itl.hydrated_tweet_features.has_periscope", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VINE = new Binary( + "itl.hydrated_tweet_features.has_vine", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NATIVE_VIDEO = new Binary( + "itl.hydrated_tweet_features.has_native_video", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_LINK = new Binary( + "itl.hydrated_tweet_features.has_link", + Set(UrlFoundFlag, PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val LINK_COUNT = new Continuous( + "itl.hydrated_tweet_features.link_count", + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val URL_DOMAINS = new SparseBinary( + "itl.hydrated_tweet_features.url_domains", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VISIBLE_LINK = new Binary( + "itl.hydrated_tweet_features.has_visible_link", + Set(UrlFoundFlag, PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NEWS = new Binary( + "itl.hydrated_tweet_features.has_news", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_TREND = new Binary( + "itl.hydrated_tweet_features.has_trend", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val BLENDER_SCORE = + new Continuous("itl.hydrated_tweet_features.blender_score", Set(EngagementScore).asJava) + val PARUS_SCORE = + new Continuous("itl.hydrated_tweet_features.parus_score", Set(EngagementScore).asJava) + val TEXT_SCORE = + new Continuous("itl.hydrated_tweet_features.text_score", Set(EngagementScore).asJava) + val BIDIRECTIONAL_REPLY_COUNT = new Continuous( + "itl.hydrated_tweet_features.bidirectional_reply_count", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val UNIDIRECTIONAL_REPLY_COUNT = new Continuous( + "itl.hydrated_tweet_features.unidirectional_reply_count", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val BIDIRECTIONAL_RETWEET_COUNT = new Continuous( + "itl.hydrated_tweet_features.bidirectional_retweet_count", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val UNIDIRECTIONAL_RETWEET_COUNT = new Continuous( + "itl.hydrated_tweet_features.unidirectional_retweet_count", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val BIDIRECTIONAL_FAV_COUNT = new Continuous( + "itl.hydrated_tweet_features.bidirectional_fav_count", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val UNIDIRECTIONAL_FAV_COUNT = new Continuous( + "itl.hydrated_tweet_features.unidirectional_fav_count", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val CONVERSATION_COUNT = new Continuous("itl.hydrated_tweet_features.conversation_count") + val FAV_COUNT = new Continuous( + "itl.hydrated_tweet_features.fav_count", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val REPLY_COUNT = new Continuous( + "itl.hydrated_tweet_features.reply_count", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val RETWEET_COUNT = new Continuous( + "itl.hydrated_tweet_features.retweet_count", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val PREV_USER_TWEET_ENGAGEMENT = new Continuous( + "itl.hydrated_tweet_features.prev_user_tweet_enagagement", + Set(EngagementScore, EngagementsPrivate, EngagementsPublic).asJava + ) + val IS_SENSITIVE = new Binary("itl.hydrated_tweet_features.is_sensitive") + val HAS_MULTIPLE_MEDIA = new Binary( + "itl.hydrated_tweet_features.has_multiple_media", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_MULTIPLE_HASHTAGS_OR_TRENDS = new Binary( + "itl.hydrated_tweet_features.has_multiple_hashtag_or_trend", + Set( + UserVisibleFlag, + CountOfPrivateTweetEntitiesAndMetadata, + CountOfPublicTweetEntitiesAndMetadata).asJava) + val IS_AUTHOR_PROFILE_EGG = + new Binary("itl.hydrated_tweet_features.is_author_profile_egg", Set(ProfileImage).asJava) + val IS_AUTHOR_NEW = + new Binary("itl.hydrated_tweet_features.is_author_new", Set(UserType, UserState).asJava) + val NUM_MENTIONS = new Continuous( + "itl.hydrated_tweet_features.num_mentions", + Set( + UserVisibleFlag, + CountOfPrivateTweetEntitiesAndMetadata, + CountOfPublicTweetEntitiesAndMetadata).asJava) + val NUM_HASHTAGS = new Continuous( + "itl.hydrated_tweet_features.num_hashtags", + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val LANGUAGE = new Discrete( + "itl.hydrated_tweet_features.language", + Set(ProvidedLanguage, InferredLanguage).asJava) + val LINK_LANGUAGE = new Continuous( + "itl.hydrated_tweet_features.link_language", + Set(ProvidedLanguage, InferredLanguage).asJava) + val IS_AUTHOR_NSFW = + new Binary("itl.hydrated_tweet_features.is_author_nsfw", Set(UserType).asJava) + val IS_AUTHOR_SPAM = + new Binary("itl.hydrated_tweet_features.is_author_spam", Set(UserType).asJava) + val IS_AUTHOR_BOT = new Binary("itl.hydrated_tweet_features.is_author_bot", Set(UserType).asJava) + val IS_OFFENSIVE = new Binary("itl.hydrated_tweet_features.is_offensive") + val FROM_VERIFIED_ACCOUNT = + new Binary("itl.hydrated_tweet_features.from_verified_account", Set(UserVerifiedFlag).asJava) + val EMBEDS_IMPRESSION_COUNT = new Continuous( + "itl.hydrated_tweet_features.embeds_impression_count", + Set(CountOfImpression).asJava) + val EMBEDS_URL_COUNT = + new Continuous("itl.hydrated_tweet_features.embeds_url_count", Set(UrlFoundFlag).asJava) + val FAV_COUNT_V2 = new Continuous( + "recap.earlybird.fav_count_v2", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val RETWEET_COUNT_V2 = new Continuous( + "recap.earlybird.retweet_count_v2", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val REPLY_COUNT_V2 = new Continuous( + "recap.earlybird.reply_count_v2", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/list_features/BUILD b/src/scala/com/twitter/timelines/prediction/features/list_features/BUILD new file mode 100644 index 000000000..6fc497bf3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/list_features/BUILD @@ -0,0 +1,9 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/list_features/ListFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/list_features/ListFeatures.scala new file mode 100644 index 000000000..ffb00d1f6 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/list_features/ListFeatures.scala @@ -0,0 +1,24 @@ +package com.twitter.timelines.prediction.features.list_features + +import com.twitter.ml.api.Feature.{Binary, Discrete} +import com.twitter.ml.api.FeatureContext +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import scala.collection.JavaConverters._ + +object ListFeatures { + + // list.id is used for list tweet injections in home. timelines.meta.list_id is used for list tweets in list timeline. + val LIST_ID = new Discrete("list.id") + + val VIEWER_IS_OWNER = + new Binary("list.viewer.is_owner", Set(ListsNonpublicList, ListsPublicList).asJava) + val VIEWER_IS_SUBSCRIBER = new Binary("list.viewer.is_subscriber") + val IS_PINNED_LIST = new Binary("list.is_pinned") + + val featureContext = new FeatureContext( + LIST_ID, + VIEWER_IS_OWNER, + VIEWER_IS_SUBSCRIBER, + IS_PINNED_LIST + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/p_home_latest/BUILD b/src/scala/com/twitter/timelines/prediction/features/p_home_latest/BUILD new file mode 100644 index 000000000..6fc497bf3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/p_home_latest/BUILD @@ -0,0 +1,9 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/p_home_latest/HomeLatestUserFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/p_home_latest/HomeLatestUserFeatures.scala new file mode 100644 index 000000000..65d721a05 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/p_home_latest/HomeLatestUserFeatures.scala @@ -0,0 +1,49 @@ +package com.twitter.timelines.prediction.features.p_home_latest + +import com.twitter.ml.api.Feature.{Continuous, Discrete} +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import scala.collection.JavaConverters._ + +object HomeLatestUserFeatures { + val LAST_LOGIN_TIMESTAMP_MS = + new Discrete("home_latest.user_feature.last_login_timestamp_ms", Set(PrivateTimestamp).asJava) +} + +object HomeLatestUserAggregatesFeatures { + + /** + * Used as `timestampFeature` in `OfflineAggregateSource` required by feature aggregations, set to + * the `dateRange` end timestamp by default + */ + val AGGREGATE_TIMESTAMP_MS = + new Discrete("home_latest.user_feature.aggregate_timestamp_ms", Set(PrivateTimestamp).asJava) + val HOME_TOP_IMPRESSIONS = + new Continuous("home_latest.user_feature.home_top_impressions", Set(CountOfImpression).asJava) + val HOME_LATEST_IMPRESSIONS = + new Continuous( + "home_latest.user_feature.home_latest_impressions", + Set(CountOfImpression).asJava) + val HOME_TOP_LAST_LOGIN_TIMESTAMP_MS = + new Discrete( + "home_latest.user_feature.home_top_last_login_timestamp_ms", + Set(PrivateTimestamp).asJava) + val HOME_LATEST_LAST_LOGIN_TIMESTAMP_MS = + new Discrete( + "home_latest.user_feature.home_latest_last_login_timestamp_ms", + Set(PrivateTimestamp).asJava) + val HOME_LATEST_MOST_RECENT_CLICK_TIMESTAMP_MS = + new Discrete( + "home_latest.user_feature.home_latest_most_recent_click_timestamp_ms", + Set(PrivateTimestamp).asJava) +} + +case class HomeLatestUserFeatures(userId: Long, lastLoginTimestampMs: Long) + +case class HomeLatestUserAggregatesFeatures( + userId: Long, + aggregateTimestampMs: Long, + homeTopImpressions: Option[Double], + homeLatestImpressions: Option[Double], + homeTopLastLoginTimestampMs: Option[Long], + homeLatestLastLoginTimestampMs: Option[Long], + homeLatestMostRecentClickTimestampMs: Option[Long]) diff --git a/src/scala/com/twitter/timelines/prediction/features/ppmi/BUILD b/src/scala/com/twitter/timelines/prediction/features/ppmi/BUILD new file mode 100644 index 000000000..babba31bb --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/ppmi/BUILD @@ -0,0 +1,8 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/ppmi/PpmiFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/ppmi/PpmiFeatures.scala new file mode 100644 index 000000000..7e6d1dea8 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/ppmi/PpmiFeatures.scala @@ -0,0 +1,7 @@ +package com.twitter.timelines.prediction.features.ppmi + +import com.twitter.ml.api.Feature.Continuous + +object PpmiDataRecordFeatures { + val PPMI_SCORE = new Continuous("ppmi.source_author.score") +} diff --git a/src/scala/com/twitter/timelines/prediction/features/real_graph/BUILD b/src/scala/com/twitter/timelines/prediction/features/real_graph/BUILD new file mode 100644 index 000000000..868acec21 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/real_graph/BUILD @@ -0,0 +1,15 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/scala/com/twitter/ml/featurestore/catalog/entities/core", + "src/scala/com/twitter/ml/featurestore/catalog/entities/timelines", + "src/scala/com/twitter/ml/featurestore/catalog/features/timelines:realgraph", + "src/scala/com/twitter/ml/featurestore/lib/entity", + "src/scala/com/twitter/ml/featurestore/lib/feature", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/timelines/real_graph:real_graph-scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatureStoreFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatureStoreFeatures.scala new file mode 100644 index 000000000..7c52349aa --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatureStoreFeatures.scala @@ -0,0 +1,232 @@ +package com.twitter.timelines.prediction.features.real_graph + +import com.twitter.ml.featurestore.catalog.entities.core.UserAuthor +import com.twitter.ml.featurestore.catalog.features.timelines.RealGraph +import com.twitter.ml.featurestore.lib.EdgeEntityId +import com.twitter.ml.featurestore.lib.UserId +import com.twitter.ml.featurestore.lib.feature.BoundFeatureSet +import com.twitter.ml.featurestore.lib.feature.Feature +import com.twitter.ml.featurestore.lib.feature.FeatureSet + +object RealGraphDataRecordFeatureStoreFeatures { + val boundUserAuthorfeatureSet: BoundFeatureSet = FeatureSet( + RealGraph.DestId, + RealGraph.AddressBookEmail.DaysSinceLast, + RealGraph.AddressBookEmail.ElapsedDays, + RealGraph.AddressBookEmail.Ewma, + RealGraph.AddressBookEmail.IsMissing, + RealGraph.AddressBookEmail.Mean, + RealGraph.AddressBookEmail.NonZeroDays, + RealGraph.AddressBookEmail.Variance, + RealGraph.AddressBookInBoth.DaysSinceLast, + RealGraph.AddressBookInBoth.ElapsedDays, + RealGraph.AddressBookInBoth.Ewma, + RealGraph.AddressBookInBoth.IsMissing, + RealGraph.AddressBookInBoth.Mean, + RealGraph.AddressBookInBoth.NonZeroDays, + RealGraph.AddressBookInBoth.Variance, + RealGraph.AddressBookMutualEdgeEmail.DaysSinceLast, + RealGraph.AddressBookMutualEdgeEmail.ElapsedDays, + RealGraph.AddressBookMutualEdgeEmail.Ewma, + RealGraph.AddressBookMutualEdgeEmail.IsMissing, + RealGraph.AddressBookMutualEdgeEmail.Mean, + RealGraph.AddressBookMutualEdgeEmail.NonZeroDays, + RealGraph.AddressBookMutualEdgeEmail.Variance, + RealGraph.AddressBookMutualEdgeInBoth.DaysSinceLast, + RealGraph.AddressBookMutualEdgeInBoth.ElapsedDays, + RealGraph.AddressBookMutualEdgeInBoth.Ewma, + RealGraph.AddressBookMutualEdgeInBoth.IsMissing, + RealGraph.AddressBookMutualEdgeInBoth.Mean, + RealGraph.AddressBookMutualEdgeInBoth.NonZeroDays, + RealGraph.AddressBookMutualEdgeInBoth.Variance, + RealGraph.AddressBookMutualEdgePhone.DaysSinceLast, + RealGraph.AddressBookMutualEdgePhone.ElapsedDays, + RealGraph.AddressBookMutualEdgePhone.Ewma, + RealGraph.AddressBookMutualEdgePhone.IsMissing, + RealGraph.AddressBookMutualEdgePhone.Mean, + RealGraph.AddressBookMutualEdgePhone.NonZeroDays, + RealGraph.AddressBookMutualEdgePhone.Variance, + RealGraph.AddressBookPhone.DaysSinceLast, + RealGraph.AddressBookPhone.ElapsedDays, + RealGraph.AddressBookPhone.Ewma, + RealGraph.AddressBookPhone.IsMissing, + RealGraph.AddressBookPhone.Mean, + RealGraph.AddressBookPhone.NonZeroDays, + RealGraph.AddressBookPhone.Variance, + RealGraph.DirectMessages.DaysSinceLast, + RealGraph.DirectMessages.ElapsedDays, + RealGraph.DirectMessages.Ewma, + RealGraph.DirectMessages.IsMissing, + RealGraph.DirectMessages.Mean, + RealGraph.DirectMessages.NonZeroDays, + RealGraph.DirectMessages.Variance, + RealGraph.DwellTime.DaysSinceLast, + RealGraph.DwellTime.ElapsedDays, + RealGraph.DwellTime.Ewma, + RealGraph.DwellTime.IsMissing, + RealGraph.DwellTime.Mean, + RealGraph.DwellTime.NonZeroDays, + RealGraph.DwellTime.Variance, + RealGraph.Follow.DaysSinceLast, + RealGraph.Follow.ElapsedDays, + RealGraph.Follow.Ewma, + RealGraph.Follow.IsMissing, + RealGraph.Follow.Mean, + RealGraph.Follow.NonZeroDays, + RealGraph.Follow.Variance, + RealGraph.InspectedStatuses.DaysSinceLast, + RealGraph.InspectedStatuses.ElapsedDays, + RealGraph.InspectedStatuses.Ewma, + RealGraph.InspectedStatuses.IsMissing, + RealGraph.InspectedStatuses.Mean, + RealGraph.InspectedStatuses.NonZeroDays, + RealGraph.InspectedStatuses.Variance, + RealGraph.Likes.DaysSinceLast, + RealGraph.Likes.ElapsedDays, + RealGraph.Likes.Ewma, + RealGraph.Likes.IsMissing, + RealGraph.Likes.Mean, + RealGraph.Likes.NonZeroDays, + RealGraph.Likes.Variance, + RealGraph.LinkClicks.DaysSinceLast, + RealGraph.LinkClicks.ElapsedDays, + RealGraph.LinkClicks.Ewma, + RealGraph.LinkClicks.IsMissing, + RealGraph.LinkClicks.Mean, + RealGraph.LinkClicks.NonZeroDays, + RealGraph.LinkClicks.Variance, + RealGraph.Mentions.DaysSinceLast, + RealGraph.Mentions.ElapsedDays, + RealGraph.Mentions.Ewma, + RealGraph.Mentions.IsMissing, + RealGraph.Mentions.Mean, + RealGraph.Mentions.NonZeroDays, + RealGraph.Mentions.Variance, + RealGraph.MutualFollow.DaysSinceLast, + RealGraph.MutualFollow.ElapsedDays, + RealGraph.MutualFollow.Ewma, + RealGraph.MutualFollow.IsMissing, + RealGraph.MutualFollow.Mean, + RealGraph.MutualFollow.NonZeroDays, + RealGraph.MutualFollow.Variance, + RealGraph.NumTweetQuotes.DaysSinceLast, + RealGraph.NumTweetQuotes.ElapsedDays, + RealGraph.NumTweetQuotes.Ewma, + RealGraph.NumTweetQuotes.IsMissing, + RealGraph.NumTweetQuotes.Mean, + RealGraph.NumTweetQuotes.NonZeroDays, + RealGraph.NumTweetQuotes.Variance, + RealGraph.PhotoTags.DaysSinceLast, + RealGraph.PhotoTags.ElapsedDays, + RealGraph.PhotoTags.Ewma, + RealGraph.PhotoTags.IsMissing, + RealGraph.PhotoTags.Mean, + RealGraph.PhotoTags.NonZeroDays, + RealGraph.PhotoTags.Variance, + RealGraph.ProfileViews.DaysSinceLast, + RealGraph.ProfileViews.ElapsedDays, + RealGraph.ProfileViews.Ewma, + RealGraph.ProfileViews.IsMissing, + RealGraph.ProfileViews.Mean, + RealGraph.ProfileViews.NonZeroDays, + RealGraph.ProfileViews.Variance, + RealGraph.Retweets.DaysSinceLast, + RealGraph.Retweets.ElapsedDays, + RealGraph.Retweets.Ewma, + RealGraph.Retweets.IsMissing, + RealGraph.Retweets.Mean, + RealGraph.Retweets.NonZeroDays, + RealGraph.Retweets.Variance, + RealGraph.SmsFollow.DaysSinceLast, + RealGraph.SmsFollow.ElapsedDays, + RealGraph.SmsFollow.Ewma, + RealGraph.SmsFollow.IsMissing, + RealGraph.SmsFollow.Mean, + RealGraph.SmsFollow.NonZeroDays, + RealGraph.SmsFollow.Variance, + RealGraph.TweetClicks.DaysSinceLast, + RealGraph.TweetClicks.ElapsedDays, + RealGraph.TweetClicks.Ewma, + RealGraph.TweetClicks.IsMissing, + RealGraph.TweetClicks.Mean, + RealGraph.TweetClicks.NonZeroDays, + RealGraph.TweetClicks.Variance, + RealGraph.Weight + ).bind(UserAuthor) + + private[this] val edgeFeatures: Seq[RealGraph.EdgeFeature] = Seq( + RealGraph.AddressBookEmail, + RealGraph.AddressBookInBoth, + RealGraph.AddressBookMutualEdgeEmail, + RealGraph.AddressBookMutualEdgeInBoth, + RealGraph.AddressBookMutualEdgePhone, + RealGraph.AddressBookPhone, + RealGraph.DirectMessages, + RealGraph.DwellTime, + RealGraph.Follow, + RealGraph.InspectedStatuses, + RealGraph.Likes, + RealGraph.LinkClicks, + RealGraph.Mentions, + RealGraph.MutualFollow, + RealGraph.PhotoTags, + RealGraph.ProfileViews, + RealGraph.Retweets, + RealGraph.SmsFollow, + RealGraph.TweetClicks + ) + + val htlDoubleFeatures: Set[Feature[EdgeEntityId[UserId, UserId], Double]] = { + val features = edgeFeatures.flatMap { ef => + Seq(ef.Ewma, ef.Mean, ef.Variance) + } ++ Seq(RealGraph.Weight) + features.toSet + } + + val htlLongFeatures: Set[Feature[EdgeEntityId[UserId, UserId], Long]] = { + val features = edgeFeatures.flatMap { ef => + Seq(ef.DaysSinceLast, ef.ElapsedDays, ef.NonZeroDays) + } + features.toSet + } + + private val edgeFeatureToLegacyName = Map( + RealGraph.AddressBookEmail -> "num_address_book_email", + RealGraph.AddressBookInBoth -> "num_address_book_in_both", + RealGraph.AddressBookMutualEdgeEmail -> "num_address_book_mutual_edge_email", + RealGraph.AddressBookMutualEdgeInBoth -> "num_address_book_mutual_edge_in_both", + RealGraph.AddressBookMutualEdgePhone -> "num_address_book_mutual_edge_phone", + RealGraph.AddressBookPhone -> "num_address_book_phone", + RealGraph.DirectMessages -> "direct_messages", + RealGraph.DwellTime -> "total_dwell_time", + RealGraph.Follow -> "num_follow", + RealGraph.InspectedStatuses -> "num_inspected_tweets", + RealGraph.Likes -> "num_favorites", + RealGraph.LinkClicks -> "num_link_clicks", + RealGraph.Mentions -> "num_mentions", + RealGraph.MutualFollow -> "num_mutual_follow", + RealGraph.PhotoTags -> "num_photo_tags", + RealGraph.ProfileViews -> "num_profile_views", + RealGraph.Retweets -> "num_retweets", + RealGraph.SmsFollow -> "num_sms_follow", + RealGraph.TweetClicks -> "num_tweet_clicks", + ) + + def convertFeatureToLegacyName( + prefix: String, + variance: String = "variance" + ): Map[Feature[EdgeEntityId[UserId, UserId], _ >: Long with Double <: AnyVal], String] = + edgeFeatureToLegacyName.flatMap { + case (k, v) => + Seq( + k.NonZeroDays -> s"${prefix}.${v}.non_zero_days", + k.DaysSinceLast -> s"${prefix}.${v}.days_since_last", + k.ElapsedDays -> s"${prefix}.${v}.elapsed_days", + k.Ewma -> s"${prefix}.${v}.ewma", + k.Mean -> s"${prefix}.${v}.mean", + k.Variance -> s"${prefix}.${v}.${variance}", + ) + } ++ Map( + RealGraph.Weight -> (prefix + ".weight") + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatures.scala new file mode 100644 index 000000000..4c1915944 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/real_graph/RealGraphDataRecordFeatures.scala @@ -0,0 +1,534 @@ +package com.twitter.timelines.prediction.features.real_graph + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature._ +import com.twitter.timelines.real_graph.v1.thriftscala.RealGraphEdgeFeature +import scala.collection.JavaConverters._ + + +object RealGraphDataRecordFeatures { + // the source user id + val SRC_ID = new Discrete("realgraph.src_id", Set(UserId).asJava) + // the destination user id + val DST_ID = new Discrete("realgraph.dst_id", Set(UserId).asJava) + // real graph weight + val WEIGHT = new Continuous("realgraph.weight", Set(UsersRealGraphScore).asJava) + // the number of retweets that the source user sent to the destination user + val NUM_RETWEETS_MEAN = + new Continuous("realgraph.num_retweets.mean", Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_EWMA = + new Continuous("realgraph.num_retweets.ewma", Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_VARIANCE = + new Continuous("realgraph.num_retweets.variance", Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_NON_ZERO_DAYS = new Continuous( + "realgraph.num_retweets.non_zero_days", + Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_ELAPSED_DAYS = new Continuous( + "realgraph.num_retweets.elapsed_days", + Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_retweets.days_since_last", + Set(PrivateRetweets, PublicRetweets).asJava) + val NUM_RETWEETS_IS_MISSING = + new Binary("realgraph.num_retweets.is_missing", Set(PrivateRetweets, PublicRetweets).asJava) + // the number of favories that the source user sent to the destination user + val NUM_FAVORITES_MEAN = + new Continuous("realgraph.num_favorites.mean", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_EWMA = + new Continuous("realgraph.num_favorites.ewma", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_VARIANCE = + new Continuous("realgraph.num_favorites.variance", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_NON_ZERO_DAYS = + new Continuous("realgraph.num_favorites.non_zero_days", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_ELAPSED_DAYS = + new Continuous("realgraph.num_favorites.elapsed_days", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_DAYS_SINCE_LAST = + new Continuous("realgraph.num_favorites.days_since_last", Set(PublicLikes, PrivateLikes).asJava) + val NUM_FAVORITES_IS_MISSING = + new Binary("realgraph.num_favorites.is_missing", Set(PublicLikes, PrivateLikes).asJava) + // the number of mentions that the source user sent to the destination user + val NUM_MENTIONS_MEAN = + new Continuous("realgraph.num_mentions.mean", Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_EWMA = + new Continuous("realgraph.num_mentions.ewma", Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_VARIANCE = new Continuous( + "realgraph.num_mentions.variance", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_NON_ZERO_DAYS = new Continuous( + "realgraph.num_mentions.non_zero_days", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_ELAPSED_DAYS = new Continuous( + "realgraph.num_mentions.elapsed_days", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_mentions.days_since_last", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_MENTIONS_IS_MISSING = new Binary( + "realgraph.num_mentions.is_missing", + Set(EngagementsPrivate, EngagementsPublic).asJava) + // the number of direct messages that the source user sent to the destination user + val NUM_DIRECT_MESSAGES_MEAN = new Continuous( + "realgraph.num_direct_messages.mean", + Set(DmEntitiesAndMetadata, CountOfDms).asJava) + val NUM_DIRECT_MESSAGES_EWMA = new Continuous( + "realgraph.num_direct_messages.ewma", + Set(DmEntitiesAndMetadata, CountOfDms).asJava) + val NUM_DIRECT_MESSAGES_VARIANCE = new Continuous( + "realgraph.num_direct_messages.variance", + Set(DmEntitiesAndMetadata, CountOfDms).asJava) + val NUM_DIRECT_MESSAGES_NON_ZERO_DAYS = new Continuous( + "realgraph.num_direct_messages.non_zero_days", + Set(DmEntitiesAndMetadata, CountOfDms).asJava + ) + val NUM_DIRECT_MESSAGES_ELAPSED_DAYS = new Continuous( + "realgraph.num_direct_messages.elapsed_days", + Set(DmEntitiesAndMetadata, CountOfDms).asJava + ) + val NUM_DIRECT_MESSAGES_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_direct_messages.days_since_last", + Set(DmEntitiesAndMetadata, CountOfDms).asJava + ) + val NUM_DIRECT_MESSAGES_IS_MISSING = new Binary( + "realgraph.num_direct_messages.is_missing", + Set(DmEntitiesAndMetadata, CountOfDms).asJava) + // the number of tweet clicks that the source user sent to the destination user + val NUM_TWEET_CLICKS_MEAN = + new Continuous("realgraph.num_tweet_clicks.mean", Set(TweetsClicked).asJava) + val NUM_TWEET_CLICKS_EWMA = + new Continuous("realgraph.num_tweet_clicks.ewma", Set(TweetsClicked).asJava) + val NUM_TWEET_CLICKS_VARIANCE = + new Continuous("realgraph.num_tweet_clicks.variance", Set(TweetsClicked).asJava) + val NUM_TWEET_CLICKS_NON_ZERO_DAYS = + new Continuous("realgraph.num_tweet_clicks.non_zero_days", Set(TweetsClicked).asJava) + val NUM_TWEET_CLICKS_ELAPSED_DAYS = + new Continuous("realgraph.num_tweet_clicks.elapsed_days", Set(TweetsClicked).asJava) + val NUM_TWEET_CLICKS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_tweet_clicks.days_since_last", + Set(TweetsClicked).asJava + ) + val NUM_TWEET_CLICKS_IS_MISSING = + new Binary("realgraph.num_tweet_clicks.is_missing", Set(TweetsClicked).asJava) + // the number of link clicks that the source user sent to the destination user + val NUM_LINK_CLICKS_MEAN = + new Continuous("realgraph.num_link_clicks.mean", Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_EWMA = + new Continuous("realgraph.num_link_clicks.ewma", Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_VARIANCE = + new Continuous("realgraph.num_link_clicks.variance", Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_NON_ZERO_DAYS = new Continuous( + "realgraph.num_link_clicks.non_zero_days", + Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_ELAPSED_DAYS = new Continuous( + "realgraph.num_link_clicks.elapsed_days", + Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_link_clicks.days_since_last", + Set(CountOfTweetEntitiesClicked).asJava) + val NUM_LINK_CLICKS_IS_MISSING = + new Binary("realgraph.num_link_clicks.is_missing", Set(CountOfTweetEntitiesClicked).asJava) + // the number of profile views that the source user sent to the destination user + val NUM_PROFILE_VIEWS_MEAN = + new Continuous("realgraph.num_profile_views.mean", Set(ProfilesViewed).asJava) + val NUM_PROFILE_VIEWS_EWMA = + new Continuous("realgraph.num_profile_views.ewma", Set(ProfilesViewed).asJava) + val NUM_PROFILE_VIEWS_VARIANCE = + new Continuous("realgraph.num_profile_views.variance", Set(ProfilesViewed).asJava) + val NUM_PROFILE_VIEWS_NON_ZERO_DAYS = + new Continuous("realgraph.num_profile_views.non_zero_days", Set(ProfilesViewed).asJava) + val NUM_PROFILE_VIEWS_ELAPSED_DAYS = + new Continuous("realgraph.num_profile_views.elapsed_days", Set(ProfilesViewed).asJava) + val NUM_PROFILE_VIEWS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_profile_views.days_since_last", + Set(ProfilesViewed).asJava + ) + val NUM_PROFILE_VIEWS_IS_MISSING = + new Binary("realgraph.num_profile_views.is_missing", Set(ProfilesViewed).asJava) + // the total dwell time the source user spends on the target user's tweets + val TOTAL_DWELL_TIME_MEAN = + new Continuous("realgraph.total_dwell_time.mean", Set(CountOfImpression).asJava) + val TOTAL_DWELL_TIME_EWMA = + new Continuous("realgraph.total_dwell_time.ewma", Set(CountOfImpression).asJava) + val TOTAL_DWELL_TIME_VARIANCE = + new Continuous("realgraph.total_dwell_time.variance", Set(CountOfImpression).asJava) + val TOTAL_DWELL_TIME_NON_ZERO_DAYS = + new Continuous("realgraph.total_dwell_time.non_zero_days", Set(CountOfImpression).asJava) + val TOTAL_DWELL_TIME_ELAPSED_DAYS = + new Continuous("realgraph.total_dwell_time.elapsed_days", Set(CountOfImpression).asJava) + val TOTAL_DWELL_TIME_DAYS_SINCE_LAST = new Continuous( + "realgraph.total_dwell_time.days_since_last", + Set(CountOfImpression).asJava + ) + val TOTAL_DWELL_TIME_IS_MISSING = + new Binary("realgraph.total_dwell_time.is_missing", Set(CountOfImpression).asJava) + // the number of the target user's tweets that the source user has inspected + val NUM_INSPECTED_TWEETS_MEAN = + new Continuous("realgraph.num_inspected_tweets.mean", Set(CountOfImpression).asJava) + val NUM_INSPECTED_TWEETS_EWMA = + new Continuous("realgraph.num_inspected_tweets.ewma", Set(CountOfImpression).asJava) + val NUM_INSPECTED_TWEETS_VARIANCE = + new Continuous("realgraph.num_inspected_tweets.variance", Set(CountOfImpression).asJava) + val NUM_INSPECTED_TWEETS_NON_ZERO_DAYS = new Continuous( + "realgraph.num_inspected_tweets.non_zero_days", + Set(CountOfImpression).asJava + ) + val NUM_INSPECTED_TWEETS_ELAPSED_DAYS = new Continuous( + "realgraph.num_inspected_tweets.elapsed_days", + Set(CountOfImpression).asJava + ) + val NUM_INSPECTED_TWEETS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_inspected_tweets.days_since_last", + Set(CountOfImpression).asJava + ) + val NUM_INSPECTED_TWEETS_IS_MISSING = + new Binary("realgraph.num_inspected_tweets.is_missing", Set(CountOfImpression).asJava) + // the number of photos in which the source user has tagged the target user + val NUM_PHOTO_TAGS_MEAN = new Continuous( + "realgraph.num_photo_tags.mean", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_EWMA = new Continuous( + "realgraph.num_photo_tags.ewma", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_VARIANCE = new Continuous( + "realgraph.num_photo_tags.variance", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_NON_ZERO_DAYS = new Continuous( + "realgraph.num_photo_tags.non_zero_days", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_ELAPSED_DAYS = new Continuous( + "realgraph.num_photo_tags.elapsed_days", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_photo_tags.days_since_last", + Set(EngagementsPrivate, EngagementsPublic).asJava) + val NUM_PHOTO_TAGS_IS_MISSING = new Binary( + "realgraph.num_photo_tags.is_missing", + Set(EngagementsPrivate, EngagementsPublic).asJava) + + val NUM_FOLLOW_MEAN = new Continuous( + "realgraph.num_follow.mean", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_EWMA = new Continuous( + "realgraph.num_follow.ewma", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_VARIANCE = new Continuous( + "realgraph.num_follow.variance", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_NON_ZERO_DAYS = new Continuous( + "realgraph.num_follow.non_zero_days", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_ELAPSED_DAYS = new Continuous( + "realgraph.num_follow.elapsed_days", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_follow.days_since_last", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_FOLLOW_IS_MISSING = new Binary( + "realgraph.num_follow.is_missing", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + // the number of blocks that the source user sent to the destination user + val NUM_BLOCKS_MEAN = + new Continuous("realgraph.num_blocks.mean", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_EWMA = + new Continuous("realgraph.num_blocks.ewma", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_VARIANCE = + new Continuous("realgraph.num_blocks.variance", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_NON_ZERO_DAYS = + new Continuous("realgraph.num_blocks.non_zero_days", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_ELAPSED_DAYS = + new Continuous("realgraph.num_blocks.elapsed_days", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_DAYS_SINCE_LAST = + new Continuous("realgraph.num_blocks.days_since_last", Set(CountOfBlocks).asJava) + val NUM_BLOCKS_IS_MISSING = + new Binary("realgraph.num_blocks.is_missing", Set(CountOfBlocks).asJava) + // the number of mutes that the source user sent to the destination user + val NUM_MUTES_MEAN = + new Continuous("realgraph.num_mutes.mean", Set(CountOfMutes).asJava) + val NUM_MUTES_EWMA = + new Continuous("realgraph.num_mutes.ewma", Set(CountOfMutes).asJava) + val NUM_MUTES_VARIANCE = + new Continuous("realgraph.num_mutes.variance", Set(CountOfMutes).asJava) + val NUM_MUTES_NON_ZERO_DAYS = + new Continuous("realgraph.num_mutes.non_zero_days", Set(CountOfMutes).asJava) + val NUM_MUTES_ELAPSED_DAYS = + new Continuous("realgraph.num_mutes.elapsed_days", Set(CountOfMutes).asJava) + val NUM_MUTES_DAYS_SINCE_LAST = + new Continuous("realgraph.num_mutes.days_since_last", Set(CountOfMutes).asJava) + val NUM_MUTES_IS_MISSING = + new Binary("realgraph.num_mutes.is_missing", Set(CountOfMutes).asJava) + // the number of report as abuses that the source user sent to the destination user + val NUM_REPORTS_AS_ABUSES_MEAN = + new Continuous("realgraph.num_report_as_abuses.mean", Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_EWMA = + new Continuous("realgraph.num_report_as_abuses.ewma", Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_VARIANCE = + new Continuous("realgraph.num_report_as_abuses.variance", Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_NON_ZERO_DAYS = + new Continuous("realgraph.num_report_as_abuses.non_zero_days", Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_ELAPSED_DAYS = + new Continuous("realgraph.num_report_as_abuses.elapsed_days", Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_DAYS_SINCE_LAST = + new Continuous( + "realgraph.num_report_as_abuses.days_since_last", + Set(CountOfAbuseReports).asJava) + val NUM_REPORTS_AS_ABUSES_IS_MISSING = + new Binary("realgraph.num_report_as_abuses.is_missing", Set(CountOfAbuseReports).asJava) + // the number of report as spams that the source user sent to the destination user + val NUM_REPORTS_AS_SPAMS_MEAN = + new Continuous( + "realgraph.num_report_as_spams.mean", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_EWMA = + new Continuous( + "realgraph.num_report_as_spams.ewma", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_VARIANCE = + new Continuous( + "realgraph.num_report_as_spams.variance", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_NON_ZERO_DAYS = + new Continuous( + "realgraph.num_report_as_spams.non_zero_days", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_ELAPSED_DAYS = + new Continuous( + "realgraph.num_report_as_spams.elapsed_days", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_DAYS_SINCE_LAST = + new Continuous( + "realgraph.num_report_as_spams.days_since_last", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + val NUM_REPORTS_AS_SPAMS_IS_MISSING = + new Binary( + "realgraph.num_report_as_spams.is_missing", + Set(CountOfAbuseReports, SafetyRelationships).asJava) + + val NUM_MUTUAL_FOLLOW_MEAN = new Continuous( + "realgraph.num_mutual_follow.mean", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_EWMA = new Continuous( + "realgraph.num_mutual_follow.ewma", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_VARIANCE = new Continuous( + "realgraph.num_mutual_follow.variance", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_NON_ZERO_DAYS = new Continuous( + "realgraph.num_mutual_follow.non_zero_days", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_ELAPSED_DAYS = new Continuous( + "realgraph.num_mutual_follow.elapsed_days", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_mutual_follow.days_since_last", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + val NUM_MUTUAL_FOLLOW_IS_MISSING = new Binary( + "realgraph.num_mutual_follow.is_missing", + Set( + Follow, + PrivateAccountsFollowedBy, + PublicAccountsFollowedBy, + PrivateAccountsFollowing, + PublicAccountsFollowing).asJava + ) + + val NUM_SMS_FOLLOW_MEAN = new Continuous( + "realgraph.num_sms_follow.mean", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_EWMA = new Continuous( + "realgraph.num_sms_follow.ewma", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_VARIANCE = new Continuous( + "realgraph.num_sms_follow.variance", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_NON_ZERO_DAYS = new Continuous( + "realgraph.num_sms_follow.non_zero_days", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_ELAPSED_DAYS = new Continuous( + "realgraph.num_sms_follow.elapsed_days", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_sms_follow.days_since_last", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + val NUM_SMS_FOLLOW_IS_MISSING = new Binary( + "realgraph.num_sms_follow.is_missing", + Set(Follow, PrivateAccountsFollowedBy, PublicAccountsFollowedBy).asJava) + + val NUM_ADDRESS_BOOK_EMAIL_MEAN = + new Continuous("realgraph.num_address_book_email.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_EMAIL_EWMA = + new Continuous("realgraph.num_address_book_email.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_EMAIL_VARIANCE = + new Continuous("realgraph.num_address_book_email.variance", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_EMAIL_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_email.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_EMAIL_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_email.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_EMAIL_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_email.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_EMAIL_IS_MISSING = + new Binary("realgraph.num_address_book_email.is_missing", Set(AddressBook).asJava) + + val NUM_ADDRESS_BOOK_IN_BOTH_MEAN = + new Continuous("realgraph.num_address_book_in_both.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_IN_BOTH_EWMA = + new Continuous("realgraph.num_address_book_in_both.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_IN_BOTH_VARIANCE = new Continuous( + "realgraph.num_address_book_in_both.variance", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_IN_BOTH_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_in_both.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_IN_BOTH_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_in_both.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_IN_BOTH_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_in_both.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_IN_BOTH_IS_MISSING = new Binary( + "realgraph.num_address_book_in_both.is_missing", + Set(AddressBook).asJava + ) + + val NUM_ADDRESS_BOOK_PHONE_MEAN = + new Continuous("realgraph.num_address_book_phone.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_PHONE_EWMA = + new Continuous("realgraph.num_address_book_phone.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_PHONE_VARIANCE = + new Continuous("realgraph.num_address_book_phone.variance", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_PHONE_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_phone.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_PHONE_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_phone.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_PHONE_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_phone.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_PHONE_IS_MISSING = + new Binary("realgraph.num_address_book_phone.is_missing", Set(AddressBook).asJava) + + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_MEAN = + new Continuous("realgraph.num_address_book_mutual_edge_email.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_EWMA = + new Continuous("realgraph.num_address_book_mutual_edge_email.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_VARIANCE = + new Continuous("realgraph.num_address_book_mutual_edge_email.variance", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_email.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_email.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_mutual_edge_email.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_EMAIL_IS_MISSING = + new Binary("realgraph.num_address_book_mutual_edge_email.is_missing", Set(AddressBook).asJava) + + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_MEAN = + new Continuous("realgraph.num_address_book_mutual_edge_in_both.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_EWMA = + new Continuous("realgraph.num_address_book_mutual_edge_in_both.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_VARIANCE = new Continuous( + "realgraph.num_address_book_mutual_edge_in_both.variance", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_in_both.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_in_both.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_mutual_edge_in_both.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_IN_BOTH_IS_MISSING = new Binary( + "realgraph.num_address_book_mutual_edge_in_both.is_missing", + Set(AddressBook).asJava + ) + + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_MEAN = + new Continuous("realgraph.num_address_book_mutual_edge_phone.mean", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_EWMA = + new Continuous("realgraph.num_address_book_mutual_edge_phone.ewma", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_VARIANCE = + new Continuous("realgraph.num_address_book_mutual_edge_phone.variance", Set(AddressBook).asJava) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_NON_ZERO_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_phone.non_zero_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_ELAPSED_DAYS = new Continuous( + "realgraph.num_address_book_mutual_edge_phone.elapsed_days", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_DAYS_SINCE_LAST = new Continuous( + "realgraph.num_address_book_mutual_edge_phone.days_since_last", + Set(AddressBook).asJava + ) + val NUM_ADDRESS_BOOK_MUTUAL_EDGE_PHONE_IS_MISSING = + new Binary("realgraph.num_address_book_mutual_edge_phone.is_missing", Set(AddressBook).asJava) +} + +case class RealGraphEdgeDataRecordFeatures( + edgeFeatureOpt: Option[RealGraphEdgeFeature], + meanFeature: Continuous, + ewmaFeature: Continuous, + varianceFeature: Continuous, + nonZeroDaysFeature: Continuous, + elapsedDaysFeature: Continuous, + daysSinceLastFeature: Continuous, + isMissingFeature: Binary) diff --git a/src/scala/com/twitter/timelines/prediction/features/recap/BUILD b/src/scala/com/twitter/timelines/prediction/features/recap/BUILD new file mode 100644 index 000000000..6fc497bf3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/recap/BUILD @@ -0,0 +1,9 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeatures.scala new file mode 100644 index 000000000..c8ee6da7d --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeatures.scala @@ -0,0 +1,967 @@ +package com.twitter.timelines.prediction.features.recap + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.Discrete +import com.twitter.ml.api.Feature.SparseBinary +import com.twitter.ml.api.Feature.Text +import scala.collection.JavaConverters._ + +object RecapFeatures extends RecapFeatures("") +object InReplyToRecapFeatures extends RecapFeatures("in_reply_to_tweet") + +class RecapFeatures(prefix: String) { + private def name(featureName: String): String = { + if (prefix.nonEmpty) { + s"$prefix.$featureName" + } else { + featureName + } + } + + val IS_IPAD_CLIENT = new Binary(name("recap.client.is_ipad"), Set(ClientType).asJava) + val IS_WEB_CLIENT = new Binary(name("recap.client.is_web"), Set(ClientType).asJava) + val IS_IPHONE_CLIENT = new Binary(name("recap.client.is_phone"), Set(ClientType).asJava) + val IS_ANDROID_CLIENT = new Binary(name("recap.client.is_android"), Set(ClientType).asJava) + val IS_ANDROID_TABLET_CLIENT = + new Binary(name("recap.client.is_android_tablet"), Set(ClientType).asJava) + + // features from userAgent + val CLIENT_NAME = new Text(name("recap.user_agent.client_name"), Set(ClientType).asJava) + val CLIENT_SOURCE = new Discrete(name("recap.user_agent.client_source"), Set(ClientType).asJava) + val CLIENT_VERSION = new Text(name("recap.user_agent.client_version"), Set(ClientVersion).asJava) + val CLIENT_VERSION_CODE = + new Text(name("recap.user_agent.client_version_code"), Set(ClientVersion).asJava) + val DEVICE = new Text(name("recap.user_agent.device"), Set(DeviceType).asJava) + val FROM_DOG_FOOD = new Binary(name("recap.meta.from_dog_food"), Set(UserAgent).asJava) + val FROM_TWITTER_CLIENT = + new Binary(name("recap.user_agent.from_twitter_client"), Set(UserAgent).asJava) + val MANUFACTURER = new Text(name("recap.user_agent.manufacturer"), Set(UserAgent).asJava) + val MODEL = new Text(name("recap.user_agent.model"), Set(UserAgent).asJava) + val NETWORK_CONNECTION = + new Discrete(name("recap.user_agent.network_connection"), Set(UserAgent).asJava) + val SDK_VERSION = new Text(name("recap.user_agent.sdk_version"), Set(AppId, UserAgent).asJava) + + // engagement + val IS_RETWEETED = new Binary( + name("recap.engagement.is_retweeted"), + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_FAVORITED = new Binary( + name("recap.engagement.is_favorited"), + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED = new Binary( + name("recap.engagement.is_replied"), + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + // v1: post click engagements: fav, reply + val IS_GOOD_CLICKED_CONVO_DESC_V1 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_favorited_or_replied"), + Set( + PublicLikes, + PrivateLikes, + PublicReplies, + PrivateReplies, + EngagementsPrivate, + EngagementsPublic).asJava) + // v2: post click engagements: click + val IS_GOOD_CLICKED_CONVO_DESC_V2 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_v2"), + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_GOOD_CLICKED_CONVO_DESC_FAVORITED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_favorited"), + Set(PublicLikes, PrivateLikes, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_REPLIED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_replied"), + Set(PublicReplies, PrivateReplies, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_RETWEETED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_retweeted"), + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_CLICKED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_clicked"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_FOLLOWED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_followed"), + Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_SHARE_DM_CLICKED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_share_dm_clicked"), + Set(EngagementsPrivate).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_PROFILE_CLICKED = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_profile_clicked"), + Set(EngagementsPrivate).asJava) + + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_0 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_uam_gt_0"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_1 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_uam_gt_1"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_2 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_uam_gt_2"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_GOOD_CLICKED_CONVO_DESC_UAM_GT_3 = new Binary( + name("recap.engagement.is_good_clicked_convo_desc_uam_gt_3"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + + val IS_TWEET_DETAIL_DWELLED = new Binary( + name("recap.engagement.is_tweet_detail_dwelled"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_8_SEC = new Binary( + name("recap.engagement.is_tweet_detail_dwelled_8_sec"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_15_SEC = new Binary( + name("recap.engagement.is_tweet_detail_dwelled_15_sec"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_25_SEC = new Binary( + name("recap.engagement.is_tweet_detail_dwelled_25_sec"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_TWEET_DETAIL_DWELLED_30_SEC = new Binary( + name("recap.engagement.is_tweet_detail_dwelled_30_sec"), + Set(TweetsClicked, EngagementsPrivate).asJava) + + val IS_PROFILE_DWELLED = new Binary( + "recap.engagement.is_profile_dwelled", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_10_SEC = new Binary( + "recap.engagement.is_profile_dwelled_10_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_20_SEC = new Binary( + "recap.engagement.is_profile_dwelled_20_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_DWELLED_30_SEC = new Binary( + "recap.engagement.is_profile_dwelled_30_sec", + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED = new Binary( + "recap.engagement.is_fullscreen_video_dwelled", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_5_SEC = new Binary( + "recap.engagement.is_fullscreen_video_dwelled_5_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_10_SEC = new Binary( + "recap.engagement.is_fullscreen_video_dwelled_10_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_20_SEC = new Binary( + "recap.engagement.is_fullscreen_video_dwelled_20_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_FULLSCREEN_VIDEO_DWELLED_30_SEC = new Binary( + "recap.engagement.is_fullscreen_video_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_15_SEC = new Binary( + "recap.engagement.is_link_dwelled_15_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_30_SEC = new Binary( + "recap.engagement.is_link_dwelled_30_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_LINK_DWELLED_60_SEC = new Binary( + "recap.engagement.is_link_dwelled_60_sec", + Set(MediaEngagementActivities, EngagementTypePrivate, EngagementsPrivate).asJava) + + val IS_QUOTED = new Binary( + name("recap.engagement.is_quoted"), + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_RETWEETED_WITHOUT_QUOTE = new Binary( + name("recap.engagement.is_retweeted_without_quote"), + Set(PublicRetweets, PrivateRetweets, EngagementsPrivate, EngagementsPublic).asJava) + val IS_CLICKED = + new Binary(name("recap.engagement.is_clicked"), Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_DWELLED = new Binary(name("recap.engagement.is_dwelled"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_IN_BOUNDS_V1 = + new Binary(name("recap.engagement.is_dwelled_in_bounds_v1"), Set(EngagementsPrivate).asJava) + val DWELL_NORMALIZED_OVERALL = new Continuous( + name("recap.engagement.dwell_normalized_overall"), + Set(EngagementsPrivate).asJava) + val DWELL_CDF_OVERALL = + new Continuous(name("recap.engagement.dwell_cdf_overall"), Set(EngagementsPrivate).asJava) + val DWELL_CDF = new Continuous(name("recap.engagement.dwell_cdf"), Set(EngagementsPrivate).asJava) + + val IS_DWELLED_1S = + new Binary(name("recap.engagement.is_dwelled_1s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_2S = + new Binary(name("recap.engagement.is_dwelled_2s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_3S = + new Binary(name("recap.engagement.is_dwelled_3s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_4S = + new Binary(name("recap.engagement.is_dwelled_4s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_5S = + new Binary(name("recap.engagement.is_dwelled_5s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_6S = + new Binary(name("recap.engagement.is_dwelled_6s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_7S = + new Binary(name("recap.engagement.is_dwelled_7s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_8S = + new Binary(name("recap.engagement.is_dwelled_8s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_9S = + new Binary(name("recap.engagement.is_dwelled_9s"), Set(EngagementsPrivate).asJava) + val IS_DWELLED_10S = + new Binary(name("recap.engagement.is_dwelled_10s"), Set(EngagementsPrivate).asJava) + + val IS_SKIPPED_1S = + new Binary(name("recap.engagement.is_skipped_1s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_2S = + new Binary(name("recap.engagement.is_skipped_2s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_3S = + new Binary(name("recap.engagement.is_skipped_3s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_4S = + new Binary(name("recap.engagement.is_skipped_4s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_5S = + new Binary(name("recap.engagement.is_skipped_5s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_6S = + new Binary(name("recap.engagement.is_skipped_6s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_7S = + new Binary(name("recap.engagement.is_skipped_7s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_8S = + new Binary(name("recap.engagement.is_skipped_8s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_9S = + new Binary(name("recap.engagement.is_skipped_9s"), Set(EngagementsPrivate).asJava) + val IS_SKIPPED_10S = + new Binary(name("recap.engagement.is_skipped_10s"), Set(EngagementsPrivate).asJava) + + val IS_IMPRESSED = + new Binary(name("recap.engagement.is_impressed"), Set(EngagementsPrivate).asJava) + val IS_FOLLOWED = + new Binary("recap.engagement.is_followed", Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_PROFILE_CLICKED = new Binary( + name("recap.engagement.is_profile_clicked"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_OPEN_LINKED = new Binary( + name("recap.engagement.is_open_linked"), + Set(EngagementsPrivate, LinksClickedOn).asJava) + val IS_PHOTO_EXPANDED = + new Binary(name("recap.engagement.is_photo_expanded"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_VIEWED = + new Binary(name("recap.engagement.is_video_viewed"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_START = + new Binary(name("recap.engagement.is_video_playback_start"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_25 = + new Binary(name("recap.engagement.is_video_playback_25"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_50 = + new Binary(name("recap.engagement.is_video_playback_50"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_75 = + new Binary(name("recap.engagement.is_video_playback_75"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_95 = + new Binary(name("recap.engagement.is_video_playback_95"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_PLAYBACK_COMPLETE = + new Binary(name("recap.engagement.is_video_playback_complete"), Set(EngagementsPrivate).asJava) + val IS_VIDEO_VIEWED_AND_PLAYBACK_50 = new Binary( + name("recap.engagement.is_video_viewed_and_playback_50"), + Set(EngagementsPrivate).asJava) + val IS_VIDEO_QUALITY_VIEWED = new Binary( + name("recap.engagement.is_video_quality_viewed"), + Set(EngagementsPrivate).asJava + ) + val IS_TWEET_SHARE_DM_CLICKED = + new Binary(name("recap.engagement.is_tweet_share_dm_clicked"), Set(EngagementsPrivate).asJava) + val IS_TWEET_SHARE_DM_SENT = + new Binary(name("recap.engagement.is_tweet_share_dm_sent"), Set(EngagementsPrivate).asJava) + val IS_BOOKMARKED = + new Binary(name("recap.engagement.is_bookmarked"), Set(EngagementsPrivate).asJava) + val IS_SHARED = + new Binary(name("recap.engagement.is_shared"), Set(EngagementsPrivate).asJava) + val IS_SHARE_MENU_CLICKED = + new Binary(name("recap.engagement.is_share_menu_clicked"), Set(EngagementsPrivate).asJava) + + // Negative engagements + val IS_DONT_LIKE = + new Binary(name("recap.engagement.is_dont_like"), Set(EngagementsPrivate).asJava) + val IS_BLOCK_CLICKED = new Binary( + name("recap.engagement.is_block_clicked"), + Set(TweetsClicked, EngagementsPrivate, EngagementsPublic).asJava) + val IS_BLOCK_DIALOG_BLOCKED = new Binary( + name("recap.engagement.is_block_dialog_blocked"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_MUTE_CLICKED = new Binary( + name("recap.engagement.is_mute_clicked"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_MUTE_DIALOG_MUTED = + new Binary(name("recap.engagement.is_mute_dialog_muted"), Set(EngagementsPrivate).asJava) + val IS_REPORT_TWEET_CLICKED = new Binary( + name("recap.engagement.is_report_tweet_clicked"), + Set(TweetsClicked, EngagementsPrivate).asJava) + val IS_NEGATIVE_FEEDBACK = + new Binary("recap.engagement.is_negative_feedback", Set(EngagementsPrivate).asJava) + val IS_NOT_ABOUT_TOPIC = + new Binary(name("recap.engagement.is_not_about_topic"), Set(EngagementsPrivate).asJava) + val IS_NOT_RECENT = + new Binary(name("recap.engagement.is_not_recent"), Set(EngagementsPrivate).asJava) + val IS_NOT_RELEVANT = + new Binary(name("recap.engagement.is_not_relevant"), Set(EngagementsPrivate).asJava) + val IS_SEE_FEWER = + new Binary(name("recap.engagement.is_see_fewer"), Set(EngagementsPrivate).asJava) + val IS_TOPIC_SPEC_NEG_ENGAGEMENT = + new Binary("recap.engagement.is_topic_spec_neg_engagement", Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC = + new Binary("recap.engagement.is_unfollow_topic", Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC_EXPLICIT_POSITIVE_LABEL = + new Binary( + "recap.engagement.is_unfollow_topic_explicit_positive_label", + Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC_IMPLICIT_POSITIVE_LABEL = + new Binary( + "recap.engagement.is_unfollow_topic_implicit_positive_label", + Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC_STRONG_EXPLICIT_NEGATIVE_LABEL = + new Binary( + "recap.engagement.is_unfollow_topic_strong_explicit_negative_label", + Set(EngagementsPrivate).asJava) + val IS_UNFOLLOW_TOPIC_EXPLICIT_NEGATIVE_LABEL = + new Binary( + "recap.engagement.is_unfollow_topic_explicit_negative_label", + Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN = + new Binary("recap.engagement.is_not_interested_in", Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN_EXPLICIT_POSITIVE_LABEL = + new Binary( + "recap.engagement.is_not_interested_in_explicit_positive_label", + Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN_EXPLICIT_NEGATIVE_LABEL = + new Binary( + "recap.engagement.is_not_interested_in_explicit_negative_label", + Set(EngagementsPrivate).asJava) + val IS_CARET_CLICKED = + new Binary(name("recap.engagement.is_caret_clicked"), Set(EngagementsPrivate).asJava) + val IS_FOLLOW_TOPIC = + new Binary("recap.engagement.is_follow_topic", Set(EngagementsPrivate).asJava) + val IS_NOT_INTERESTED_IN_TOPIC = + new Binary("recap.engagement.is_not_interested_in_topic", Set(EngagementsPrivate).asJava) + val IS_HOME_LATEST_VISITED = + new Binary(name("recap.engagement.is_home_latest_visited"), Set(EngagementsPrivate).asJava) + + // Relevance prompt tweet engagements + val IS_RELEVANCE_PROMPT_YES_CLICKED = new Binary( + name("recap.engagement.is_relevance_prompt_yes_clicked"), + Set(EngagementsPrivate).asJava) + val IS_RELEVANCE_PROMPT_NO_CLICKED = new Binary( + name("recap.engagement.is_relevance_prompt_no_clicked"), + Set(EngagementsPrivate).asJava) + val IS_RELEVANCE_PROMPT_IMPRESSED = new Binary( + name("recap.engagement.is_relevance_prompt_impressed"), + Set(EngagementsPrivate).asJava) + + // Reciprocal engagements for reply forward engagement + val IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_impressed_by_author"), + Set(EngagementsPrivate).asJava) + val IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_favorited_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateLikes, PublicLikes).asJava) + val IS_REPLIED_REPLY_QUOTED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_quoted_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava) + val IS_REPLIED_REPLY_REPLIED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_replied_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateReplies, PublicReplies).asJava) + val IS_REPLIED_REPLY_RETWEETED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_retweeted_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava) + val IS_REPLIED_REPLY_BLOCKED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_blocked_by_author"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_FOLLOWED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_followed_by_author"), + Set(EngagementsPrivate, EngagementsPublic, Follow).asJava) + val IS_REPLIED_REPLY_UNFOLLOWED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_unfollowed_by_author"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + val IS_REPLIED_REPLY_MUTED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_muted_by_author"), + Set(EngagementsPrivate).asJava) + val IS_REPLIED_REPLY_REPORTED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_reported_by_author"), + Set(EngagementsPrivate).asJava) + + // This derived label is the logical OR of REPLY_REPLIED, REPLY_FAVORITED, REPLY_RETWEETED + val IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR = new Binary( + name("recap.engagement.is_replied_reply_engaged_by_author"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + + // Reciprocal engagements for fav forward engagement + val IS_FAVORITED_FAV_FAVORITED_BY_AUTHOR = new Binary( + name("recap.engagement.is_favorited_fav_favorited_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateLikes, PublicLikes).asJava + ) + val IS_FAVORITED_FAV_REPLIED_BY_AUTHOR = new Binary( + name("recap.engagement.is_favorited_fav_replied_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateReplies, PublicReplies).asJava + ) + val IS_FAVORITED_FAV_RETWEETED_BY_AUTHOR = new Binary( + name("recap.engagement.is_favorited_fav_retweeted_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava + ) + val IS_FAVORITED_FAV_FOLLOWED_BY_AUTHOR = new Binary( + name("recap.engagement.is_favorited_fav_followed_by_author"), + Set(EngagementsPrivate, EngagementsPublic, PrivateRetweets, PublicRetweets).asJava + ) + // This derived label is the logical OR of FAV_REPLIED, FAV_FAVORITED, FAV_RETWEETED, FAV_FOLLOWED + val IS_FAVORITED_FAV_ENGAGED_BY_AUTHOR = new Binary( + name("recap.engagement.is_favorited_fav_engaged_by_author"), + Set(EngagementsPrivate, EngagementsPublic).asJava) + + // define good profile click by considering following engagements (follow, fav, reply, retweet, etc.) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_FOLLOW = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_follow"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, Follow).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_FAV = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_fav"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateLikes, PublicLikes).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_REPLY = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_reply"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, PrivateReplies, PublicReplies).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_RETWEET = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_retweet"), + Set( + ProfilesViewed, + ProfilesClicked, + EngagementsPrivate, + PrivateRetweets, + PublicRetweets).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_CLICK = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_tweet_click"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, TweetsClicked).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_SHARE_DM_CLICK = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_share_dm_click"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of all binary features above + val IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_engaged"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate, EngagementsPublic).asJava) + + // define bad profile click by considering following engagements (user report, tweet report, mute, block, etc) at profile page + val IS_PROFILE_CLICKED_AND_PROFILE_USER_REPORT_CLICK = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_user_report_click"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_TWEET_REPORT_CLICK = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_tweet_report_click"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_MUTE = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_mute"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_PROFILE_CLICKED_AND_PROFILE_BLOCK = new Binary( + name("recap.engagement.is_profile_clicked_and_profile_block"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // This derived label is the union of bad profile click engagements and existing negative feedback + val IS_NEGATIVE_FEEDBACK_V2 = new Binary( + name("recap.engagement.is_negative_feedback_v2"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_STRONG_NEGATIVE_FEEDBACK = new Binary( + name("recap.engagement.is_strong_negative_feedback"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + val IS_WEAK_NEGATIVE_FEEDBACK = new Binary( + name("recap.engagement.is_weak_negative_feedback"), + Set(ProfilesViewed, ProfilesClicked, EngagementsPrivate).asJava) + // engagement for following user from any surface area + val IS_FOLLOWED_FROM_ANY_SURFACE_AREA = new Binary( + "recap.engagement.is_followed_from_any_surface_area", + Set(EngagementsPublic, EngagementsPrivate).asJava) + + // Reply downvote engagements + val IS_REPLY_DOWNVOTED = + new Binary(name("recap.engagement.is_reply_downvoted"), Set(EngagementsPrivate).asJava) + val IS_REPLY_DOWNVOTE_REMOVED = + new Binary(name("recap.engagement.is_reply_downvote_removed"), Set(EngagementsPrivate).asJava) + + // Other engagements + val IS_GOOD_OPEN_LINK = new Binary( + name("recap.engagement.is_good_open_link"), + Set(EngagementsPrivate, LinksClickedOn).asJava) + val IS_ENGAGED = new Binary( + name("recap.engagement.any"), + Set(EngagementsPrivate, EngagementsPublic).asJava + ) // Deprecated - to be removed shortly + val IS_EARLYBIRD_UNIFIED_ENGAGEMENT = new Binary( + name("recap.engagement.is_unified_engagement"), + Set(EngagementsPrivate, EngagementsPublic).asJava + ) // A subset of IS_ENGAGED specifically intended for use in earlybird models + + // features from ThriftTweetFeatures + val PREV_USER_TWEET_ENGAGEMENT = new Continuous( + name("recap.tweetfeature.prev_user_tweet_enagagement"), + Set(EngagementScore, EngagementsPrivate, EngagementsPublic).asJava) + val IS_SENSITIVE = new Binary(name("recap.tweetfeature.is_sensitive")) + val HAS_MULTIPLE_MEDIA = new Binary( + name("recap.tweetfeature.has_multiple_media"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val IS_AUTHOR_PROFILE_EGG = new Binary(name("recap.tweetfeature.is_author_profile_egg")) + val IS_AUTHOR_NEW = + new Binary(name("recap.tweetfeature.is_author_new"), Set(UserState, UserType).asJava) + val NUM_MENTIONS = new Continuous( + name("recap.tweetfeature.num_mentions"), + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val HAS_MENTION = new Binary(name("recap.tweetfeature.has_mention"), Set(UserVisibleFlag).asJava) + val NUM_HASHTAGS = new Continuous( + name("recap.tweetfeature.num_hashtags"), + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val HAS_HASHTAG = new Binary( + name("recap.tweetfeature.has_hashtag"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val LINK_LANGUAGE = new Continuous( + name("recap.tweetfeature.link_language"), + Set(ProvidedLanguage, InferredLanguage).asJava) + val IS_AUTHOR_NSFW = + new Binary(name("recap.tweetfeature.is_author_nsfw"), Set(UserSafetyLabels, UserType).asJava) + val IS_AUTHOR_SPAM = + new Binary(name("recap.tweetfeature.is_author_spam"), Set(UserSafetyLabels, UserType).asJava) + val IS_AUTHOR_BOT = + new Binary(name("recap.tweetfeature.is_author_bot"), Set(UserSafetyLabels, UserType).asJava) + val SIGNATURE = + new Discrete(name("recap.tweetfeature.signature"), Set(DigitalSignatureNonrepudiation).asJava) + val LANGUAGE = new Discrete( + name("recap.tweetfeature.language"), + Set(ProvidedLanguage, InferredLanguage).asJava) + val FROM_INACTIVE_USER = + new Binary(name("recap.tweetfeature.from_inactive_user"), Set(UserActiveFlag).asJava) + val PROBABLY_FROM_FOLLOWED_AUTHOR = new Binary(name("recap.v3.tweetfeature.probably_from_follow")) + val FROM_MUTUAL_FOLLOW = new Binary(name("recap.tweetfeature.from_mutual_follow")) + val USER_REP = new Continuous(name("recap.tweetfeature.user_rep")) + val FROM_VERIFIED_ACCOUNT = + new Binary(name("recap.tweetfeature.from_verified_account"), Set(UserVerifiedFlag).asJava) + val IS_BUSINESS_SCORE = new Continuous(name("recap.tweetfeature.is_business_score")) + val HAS_CONSUMER_VIDEO = new Binary( + name("recap.tweetfeature.has_consumer_video"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_PRO_VIDEO = new Binary( + name("recap.tweetfeature.has_pro_video"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VINE = new Binary( + name("recap.tweetfeature.has_vine"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_PERISCOPE = new Binary( + name("recap.tweetfeature.has_periscope"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NATIVE_VIDEO = new Binary( + name("recap.tweetfeature.has_native_video"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NATIVE_IMAGE = new Binary( + name("recap.tweetfeature.has_native_image"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_CARD = new Binary( + name("recap.tweetfeature.has_card"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_IMAGE = new Binary( + name("recap.tweetfeature.has_image"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_NEWS = new Binary( + name("recap.tweetfeature.has_news"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VIDEO = new Binary( + name("recap.tweetfeature.has_video"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_VISIBLE_LINK = new Binary( + name("recap.tweetfeature.has_visible_link"), + Set(UrlFoundFlag, PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val LINK_COUNT = new Continuous( + name("recap.tweetfeature.link_count"), + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + val HAS_LINK = new Binary( + name("recap.tweetfeature.has_link"), + Set(UrlFoundFlag, PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val IS_OFFENSIVE = new Binary(name("recap.tweetfeature.is_offensive")) + val HAS_TREND = new Binary( + name("recap.tweetfeature.has_trend"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val HAS_MULTIPLE_HASHTAGS_OR_TRENDS = new Binary( + name("recap.tweetfeature.has_multiple_hashtag_or_trend"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val URL_DOMAINS = new SparseBinary( + name("recap.tweetfeature.url_domains"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val CONTAINS_MEDIA = new Binary( + name("recap.tweetfeature.contains_media"), + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val RETWEET_SEARCHER = new Binary(name("recap.tweetfeature.retweet_searcher")) + val REPLY_SEARCHER = new Binary(name("recap.tweetfeature.reply_searcher")) + val MENTION_SEARCHER = + new Binary(name("recap.tweetfeature.mention_searcher"), Set(UserVisibleFlag).asJava) + val REPLY_OTHER = + new Binary(name("recap.tweetfeature.reply_other"), Set(PublicReplies, PrivateReplies).asJava) + val RETWEET_OTHER = new Binary( + name("recap.tweetfeature.retweet_other"), + Set(PublicRetweets, PrivateRetweets).asJava) + val IS_REPLY = + new Binary(name("recap.tweetfeature.is_reply"), Set(PublicReplies, PrivateReplies).asJava) + val IS_RETWEET = + new Binary(name("recap.tweetfeature.is_retweet"), Set(PublicRetweets, PrivateRetweets).asJava) + val IS_EXTENDED_REPLY = new Binary( + name("recap.tweetfeature.is_extended_reply"), + Set(PublicReplies, PrivateReplies).asJava) + val MATCH_UI_LANG = new Binary( + name("recap.tweetfeature.match_ui_lang"), + Set(ProvidedLanguage, InferredLanguage).asJava) + val MATCH_SEARCHER_MAIN_LANG = new Binary( + name("recap.tweetfeature.match_searcher_main_lang"), + Set(ProvidedLanguage, InferredLanguage).asJava) + val MATCH_SEARCHER_LANGS = new Binary( + name("recap.tweetfeature.match_searcher_langs"), + Set(ProvidedLanguage, InferredLanguage).asJava) + val BIDIRECTIONAL_REPLY_COUNT = new Continuous( + name("recap.tweetfeature.bidirectional_reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val UNIDIRECTIONAL_REPLY_COUNT = new Continuous( + name("recap.tweetfeature.unidirectional_reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val BIDIRECTIONAL_RETWEET_COUNT = new Continuous( + name("recap.tweetfeature.bidirectional_retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val UNIDIRECTIONAL_RETWEET_COUNT = new Continuous( + name("recap.tweetfeature.unidirectional_retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val BIDIRECTIONAL_FAV_COUNT = new Continuous( + name("recap.tweetfeature.bidirectional_fav_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val UNIDIRECTIONAL_FAV_COUNT = new Continuous( + name("recap.tweetfeature.unidirectiona_fav_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val CONVERSATIONAL_COUNT = new Continuous( + name("recap.tweetfeature.conversational_count"), + Set(CountOfPrivateTweets, CountOfPublicTweets).asJava) + // tweet impressions on an embedded tweet + val EMBEDS_IMPRESSION_COUNT = new Continuous( + name("recap.tweetfeature.embeds_impression_count"), + Set(CountOfImpression).asJava) + // number of URLs that embed the tweet + val EMBEDS_URL_COUNT = new Continuous( + name("recap.tweetfeature.embeds_url_count"), + Set(CountOfPrivateTweetEntitiesAndMetadata, CountOfPublicTweetEntitiesAndMetadata).asJava) + // currently only counts views on Snappy and Amplify pro videos. Counts for other videos forthcoming + val VIDEO_VIEW_COUNT = new Continuous( + name("recap.tweetfeature.video_view_count"), + Set( + CountOfTweetEntitiesClicked, + CountOfPrivateTweetEntitiesAndMetadata, + CountOfPublicTweetEntitiesAndMetadata, + EngagementsPrivate, + EngagementsPublic).asJava + ) + val TWEET_COUNT_FROM_USER_IN_SNAPSHOT = new Continuous( + name("recap.tweetfeature.tweet_count_from_user_in_snapshot"), + Set(CountOfPrivateTweets, CountOfPublicTweets).asJava) + val NORMALIZED_PARUS_SCORE = + new Continuous("recap.tweetfeature.normalized_parus_score", Set(EngagementScore).asJava) + val PARUS_SCORE = new Continuous("recap.tweetfeature.parus_score", Set(EngagementScore).asJava) + val REAL_GRAPH_WEIGHT = + new Continuous("recap.tweetfeature.real_graph_weight", Set(UsersRealGraphScore).asJava) + val SARUS_GRAPH_WEIGHT = new Continuous("recap.tweetfeature.sarus_graph_weight") + val TOPIC_SIM_SEARCHER_INTERSTED_IN_AUTHOR_KNOWN_FOR = new Continuous( + "recap.tweetfeature.topic_sim_searcher_interested_in_author_known_for") + val TOPIC_SIM_SEARCHER_AUTHOR_BOTH_INTERESTED_IN = new Continuous( + "recap.tweetfeature.topic_sim_searcher_author_both_interested_in") + val TOPIC_SIM_SEARCHER_AUTHOR_BOTH_KNOWN_FOR = new Continuous( + "recap.tweetfeature.topic_sim_searcher_author_both_known_for") + val TOPIC_SIM_SEARCHER_INTERESTED_IN_TWEET = new Continuous( + "recap.tweetfeature.topic_sim_searcher_interested_in_tweet") + val IS_RETWEETER_PROFILE_EGG = + new Binary(name("recap.v2.tweetfeature.is_retweeter_profile_egg"), Set(UserType).asJava) + val IS_RETWEETER_NEW = + new Binary(name("recap.v2.tweetfeature.is_retweeter_new"), Set(UserType, UserState).asJava) + val IS_RETWEETER_BOT = + new Binary( + name("recap.v2.tweetfeature.is_retweeter_bot"), + Set(UserType, UserSafetyLabels).asJava) + val IS_RETWEETER_NSFW = + new Binary( + name("recap.v2.tweetfeature.is_retweeter_nsfw"), + Set(UserType, UserSafetyLabels).asJava) + val IS_RETWEETER_SPAM = + new Binary( + name("recap.v2.tweetfeature.is_retweeter_spam"), + Set(UserType, UserSafetyLabels).asJava) + val RETWEET_OF_MUTUAL_FOLLOW = new Binary( + name("recap.v2.tweetfeature.retweet_of_mutual_follow"), + Set(PublicRetweets, PrivateRetweets).asJava) + val SOURCE_AUTHOR_REP = new Continuous(name("recap.v2.tweetfeature.source_author_rep")) + val IS_RETWEET_OF_REPLY = new Binary( + name("recap.v2.tweetfeature.is_retweet_of_reply"), + Set(PublicRetweets, PrivateRetweets).asJava) + val RETWEET_DIRECTED_AT_USER_IN_FIRST_DEGREE = new Binary( + name("recap.v2.tweetfeature.is_retweet_directed_at_user_in_first_degree"), + Set(PublicRetweets, PrivateRetweets, Follow).asJava) + val MENTIONED_SCREEN_NAMES = new SparseBinary( + "entities.users.mentioned_screen_names", + Set(DisplayName, UserVisibleFlag).asJava) + val MENTIONED_SCREEN_NAME = new Text( + "entities.users.mentioned_screen_names.member", + Set(DisplayName, UserVisibleFlag).asJava) + val HASHTAGS = new SparseBinary( + "entities.hashtags", + Set(PublicTweetEntitiesAndMetadata, PrivateTweetEntitiesAndMetadata).asJava) + val URL_SLUGS = new SparseBinary(name("recap.linkfeature.url_slugs"), Set(UrlFoundFlag).asJava) + + // features from ThriftSearchResultMetadata + val REPLY_COUNT = new Continuous( + name("recap.searchfeature.reply_count"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + val RETWEET_COUNT = new Continuous( + name("recap.searchfeature.retweet_count"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val FAV_COUNT = new Continuous( + name("recap.searchfeature.fav_count"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val BLENDER_SCORE = new Continuous(name("recap.searchfeature.blender_score")) + val TEXT_SCORE = new Continuous(name("recap.searchfeature.text_score")) + + // features related to content source + val SOURCE_TYPE = new Discrete(name("recap.source.type")) + + // features from addressbook + // the author is in the user's email addressbook + val USER_TO_AUTHOR_EMAIL_REACHABLE = + new Binary(name("recap.addressbook.user_to_author_email_reachable"), Set(AddressBook).asJava) + // the author is in the user's phone addressbook + val USER_TO_AUTHOR_PHONE_REACHABLE = + new Binary(name("recap.addressbook.user_to_author_phone_reachable"), Set(AddressBook).asJava) + // the user is in the author's email addressbook + val AUTHOR_TO_USER_EMAIL_REACHABLE = + new Binary(name("recap.addressbook.author_to_user_email_reachable"), Set(AddressBook).asJava) + // the user is in the user's phone addressbook + val AUTHOR_TO_USER_PHONE_REACHABLE = + new Binary(name("recap.addressbook.author_to_user_phone_reachable"), Set(AddressBook).asJava) + + // predicted engagement (these features are used by prediction service to return the predicted engagement probability) + // these should match the names in engagement_to_score_feature_mapping + val PREDICTED_IS_FAVORITED = + new Continuous(name("recap.engagement_predicted.is_favorited"), Set(EngagementScore).asJava) + val PREDICTED_IS_RETWEETED = + new Continuous(name("recap.engagement_predicted.is_retweeted"), Set(EngagementScore).asJava) + val PREDICTED_IS_QUOTED = + new Continuous(name("recap.engagement_predicted.is_quoted"), Set(EngagementScore).asJava) + val PREDICTED_IS_REPLIED = + new Continuous(name("recap.engagement_predicted.is_replied"), Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_OPEN_LINK = new Continuous( + name("recap.engagement_predicted.is_good_open_link"), + Set(EngagementScore).asJava) + val PREDICTED_IS_PROFILE_CLICKED = new Continuous( + name("recap.engagement_predicted.is_profile_clicked"), + Set(EngagementScore).asJava) + val PREDICTED_IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED = new Continuous( + name("recap.engagement_predicted.is_profile_clicked_and_profile_engaged"), + Set(EngagementScore).asJava) + val PREDICTED_IS_CLICKED = + new Continuous(name("recap.engagement_predicted.is_clicked"), Set(EngagementScore).asJava) + val PREDICTED_IS_PHOTO_EXPANDED = new Continuous( + name("recap.engagement_predicted.is_photo_expanded"), + Set(EngagementScore).asJava) + val PREDICTED_IS_DONT_LIKE = + new Continuous(name("recap.engagement_predicted.is_dont_like"), Set(EngagementScore).asJava) + val PREDICTED_IS_VIDEO_PLAYBACK_50 = new Continuous( + name("recap.engagement_predicted.is_video_playback_50"), + Set(EngagementScore).asJava) + val PREDICTED_IS_VIDEO_QUALITY_VIEWED = new Continuous( + name("recap.engagement_predicted.is_video_quality_viewed"), + Set(EngagementScore).asJava) + val PREDICTED_IS_BOOKMARKED = + new Continuous(name("recap.engagement_predicted.is_bookmarked"), Set(EngagementScore).asJava) + val PREDICTED_IS_SHARED = + new Continuous(name("recap.engagement_predicted.is_shared"), Set(EngagementScore).asJava) + val PREDICTED_IS_SHARE_MENU_CLICKED = + new Continuous( + name("recap.engagement_predicted.is_share_menu_clicked"), + Set(EngagementScore).asJava) + val PREDICTED_IS_PROFILE_DWELLED_20_SEC = new Continuous( + name("recap.engagement_predicted.is_profile_dwelled_20_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_5_SEC = new Continuous( + name("recap.engagement_predicted.is_fullscreen_video_dwelled_5_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_10_SEC = new Continuous( + name("recap.engagement_predicted.is_fullscreen_video_dwelled_10_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_20_SEC = new Continuous( + name("recap.engagement_predicted.is_fullscreen_video_dwelled_20_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FULLSCREEN_VIDEO_DWELLED_30_SEC = new Continuous( + name("recap.engagement_predicted.is_fullscreen_video_dwelled_30_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_UNIFIED_ENGAGEMENT = new Continuous( + name("recap.engagement_predicted.is_unified_engagement"), + Set(EngagementScore).asJava) + val PREDICTED_IS_COMPOSE_TRIGGERED = new Continuous( + name("recap.engagement_predicted.is_compose_triggered"), + Set(EngagementScore).asJava) + val PREDICTED_IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR = new Continuous( + name("recap.engagement_predicted.is_replied_reply_impressed_by_author"), + Set(EngagementScore).asJava) + val PREDICTED_IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR = new Continuous( + name("recap.engagement_predicted.is_replied_reply_engaged_by_author"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_CLICKED_V1 = new Continuous( + name("recap.engagement_predicted.is_good_clicked_convo_desc_favorited_or_replied"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_CLICKED_V2 = new Continuous( + name("recap.engagement_predicted.is_good_clicked_convo_desc_v2"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_8_SEC = new Continuous( + name("recap.engagement_predicted.is_tweet_detail_dwelled_8_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_15_SEC = new Continuous( + name("recap.engagement_predicted.is_tweet_detail_dwelled_15_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_25_SEC = new Continuous( + name("recap.engagement_predicted.is_tweet_detail_dwelled_25_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_TWEET_DETAIL_DWELLED_30_SEC = new Continuous( + name("recap.engagement_predicted.is_tweet_detail_dwelled_30_sec"), + Set(EngagementScore).asJava) + val PREDICTED_IS_FAVORITED_FAV_ENGAGED_BY_AUTHOR = new Continuous( + name("recap.engagement_predicted.is_favorited_fav_engaged_by_author"), + Set(EngagementScore).asJava) + val PREDICTED_IS_GOOD_CLICKED_WITH_DWELL_SUM_GTE_60S = new Continuous( + name( + "recap.engagement_predicted.is_good_clicked_convo_desc_favorited_or_replied_or_dwell_sum_gte_60_secs"), + Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_IN_BOUNDS_V1 = new Continuous( + name("recap.engagement_predicted.is_dwelled_in_bounds_v1"), + Set(EngagementScore).asJava) + val PREDICTED_DWELL_NORMALIZED_OVERALL = new Continuous( + name("recap.engagement_predicted.dwell_normalized_overall"), + Set(EngagementScore).asJava) + val PREDICTED_DWELL_CDF = + new Continuous(name("recap.engagement_predicted.dwell_cdf"), Set(EngagementScore).asJava) + val PREDICTED_DWELL_CDF_OVERALL = new Continuous( + name("recap.engagement_predicted.dwell_cdf_overall"), + Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED = + new Continuous(name("recap.engagement_predicted.is_dwelled"), Set(EngagementScore).asJava) + + val PREDICTED_IS_DWELLED_1S = + new Continuous(name("recap.engagement_predicted.is_dwelled_1s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_2S = + new Continuous(name("recap.engagement_predicted.is_dwelled_2s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_3S = + new Continuous(name("recap.engagement_predicted.is_dwelled_3s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_4S = + new Continuous(name("recap.engagement_predicted.is_dwelled_4s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_5S = + new Continuous(name("recap.engagement_predicted.is_dwelled_5s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_6S = + new Continuous(name("recap.engagement_predicted.is_dwelled_6s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_7S = + new Continuous(name("recap.engagement_predicted.is_dwelled_7s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_8S = + new Continuous(name("recap.engagement_predicted.is_dwelled_8s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_9S = + new Continuous(name("recap.engagement_predicted.is_dwelled_9s"), Set(EngagementScore).asJava) + val PREDICTED_IS_DWELLED_10S = + new Continuous(name("recap.engagement_predicted.is_dwelled_10s"), Set(EngagementScore).asJava) + + val PREDICTED_IS_SKIPPED_1S = + new Continuous(name("recap.engagement_predicted.is_skipped_1s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_2S = + new Continuous(name("recap.engagement_predicted.is_skipped_2s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_3S = + new Continuous(name("recap.engagement_predicted.is_skipped_3s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_4S = + new Continuous(name("recap.engagement_predicted.is_skipped_4s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_5S = + new Continuous(name("recap.engagement_predicted.is_skipped_5s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_6S = + new Continuous(name("recap.engagement_predicted.is_skipped_6s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_7S = + new Continuous(name("recap.engagement_predicted.is_skipped_7s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_8S = + new Continuous(name("recap.engagement_predicted.is_skipped_8s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_9S = + new Continuous(name("recap.engagement_predicted.is_skipped_9s"), Set(EngagementScore).asJava) + val PREDICTED_IS_SKIPPED_10S = + new Continuous(name("recap.engagement_predicted.is_skipped_10s"), Set(EngagementScore).asJava) + + val PREDICTED_IS_HOME_LATEST_VISITED = new Continuous( + name("recap.engagement_predicted.is_home_latest_visited"), + Set(EngagementScore).asJava) + val PREDICTED_IS_NEGATIVE_FEEDBACK = + new Continuous( + name("recap.engagement_predicted.is_negative_feedback"), + Set(EngagementScore).asJava) + val PREDICTED_IS_NEGATIVE_FEEDBACK_V2 = + new Continuous( + name("recap.engagement_predicted.is_negative_feedback_v2"), + Set(EngagementScore).asJava) + val PREDICTED_IS_WEAK_NEGATIVE_FEEDBACK = + new Continuous( + name("recap.engagement_predicted.is_weak_negative_feedback"), + Set(EngagementScore).asJava) + val PREDICTED_IS_STRONG_NEGATIVE_FEEDBACK = + new Continuous( + name("recap.engagement_predicted.is_strong_negative_feedback"), + Set(EngagementScore).asJava) + val PREDICTED_IS_REPORT_TWEET_CLICKED = + new Continuous( + name("recap.engagement_predicted.is_report_tweet_clicked"), + Set(EngagementScore).asJava) + val PREDICTED_IS_UNFOLLOW_TOPIC = + new Continuous( + name("recap.engagement_predicted.is_unfollow_topic"), + Set(EngagementScore).asJava) + val PREDICTED_IS_RELEVANCE_PROMPT_YES_CLICKED = new Continuous( + name("recap.engagement_predicted.is_relevance_prompt_yes_clicked"), + Set(EngagementScore).asJava) + + // engagement for following user from any surface area + val PREDICTED_IS_FOLLOWED_FROM_ANY_SURFACE_AREA = new Continuous( + "recap.engagement_predicted.is_followed_from_any_surface_area", + Set(EngagementScore).asJava) + + + // These are global engagement counts for the Tweets. + val FAV_COUNT_V2 = new Continuous( + name("recap.earlybird.fav_count_v2"), + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava) + val RETWEET_COUNT_V2 = new Continuous( + name("recap.earlybird.retweet_count_v2"), + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava) + val REPLY_COUNT_V2 = new Continuous( + name("recap.earlybird.reply_count_v2"), + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava) + + val HAS_US_POLITICAL_ANNOTATION = new Binary( + name("recap.has_us_political_annotation"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ALL_GROUPS_ANNOTATION = new Binary( + name("recap.has_us_political_all_groups_annotation"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_HIGH_RECALL = new Binary( + name("recap.has_us_political_annotation_high_recall"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_HIGH_RECALL_V2 = new Binary( + name("recap.has_us_political_annotation_high_recall_v2"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_HIGH_PRECISION_V0 = new Binary( + name("recap.has_us_political_annotation_high_precision_v0"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_BALANCED_PRECISION_RECALL_V0 = new Binary( + name("recap.has_us_political_annotation_balanced_precision_recall_v0"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_HIGH_RECALL_V3 = new Binary( + name("recap.has_us_political_annotation_high_recall_v3"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_HIGH_PRECISION_V3 = new Binary( + name("recap.has_us_political_annotation_high_precision_v3"), + Set(SemanticcoreClassification).asJava + ) + + val HAS_US_POLITICAL_ANNOTATION_BALANCED_V3 = new Binary( + name("recap.has_us_political_annotation_balanced_v3"), + Set(SemanticcoreClassification).asJava + ) + +} diff --git a/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeaturesUtils.scala b/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeaturesUtils.scala new file mode 100644 index 000000000..edf152cda --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/recap/RecapFeaturesUtils.scala @@ -0,0 +1,29 @@ +package com.twitter.timelines.prediction.features.recap + +object RecapFeaturesUtils { + // This needs to be updated if an engagement model is added or removed from prediction service. + val scoreFeatureIdsMap: Map[String, Long] = Map( + RecapFeatures.IS_FAVORITED.getFeatureName -> RecapFeatures.PREDICTED_IS_FAVORITED.getFeatureId, + RecapFeatures.IS_REPLIED.getFeatureName -> RecapFeatures.PREDICTED_IS_REPLIED.getFeatureId, + RecapFeatures.IS_RETWEETED.getFeatureName -> RecapFeatures.PREDICTED_IS_RETWEETED.getFeatureId, + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V1.getFeatureName -> RecapFeatures.PREDICTED_IS_GOOD_CLICKED_V1.getFeatureId, + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V2.getFeatureName -> RecapFeatures.PREDICTED_IS_GOOD_CLICKED_V2.getFeatureId, +// RecapFeatures.IS_NEGATIVE_FEEDBACK_V2.getFeatureName -> RecapFeatures.PREDICTED_IS_NEGATIVE_FEEDBACK_V2.getFeatureId, + RecapFeatures.IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED.getFeatureName -> RecapFeatures.PREDICTED_IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED.getFeatureId, + RecapFeatures.IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR.getFeatureName -> RecapFeatures.PREDICTED_IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR.getFeatureId + ) + + // This needs to be updated if an engagement model is added or removed from prediction service. + val labelFeatureIdToScoreFeatureIdsMap: Map[Long, Long] = Map( + RecapFeatures.IS_FAVORITED.getFeatureId -> RecapFeatures.PREDICTED_IS_FAVORITED.getFeatureId, + RecapFeatures.IS_REPLIED.getFeatureId -> RecapFeatures.PREDICTED_IS_REPLIED.getFeatureId, + RecapFeatures.IS_RETWEETED.getFeatureId -> RecapFeatures.PREDICTED_IS_RETWEETED.getFeatureId, + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V1.getFeatureId -> RecapFeatures.PREDICTED_IS_GOOD_CLICKED_V1.getFeatureId, + RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V2.getFeatureId -> RecapFeatures.PREDICTED_IS_GOOD_CLICKED_V2.getFeatureId, + // RecapFeatures.IS_NEGATIVE_FEEDBACK_V2.getFeatureName -> RecapFeatures.PREDICTED_IS_NEGATIVE_FEEDBACK_V2.getFeatureId, + RecapFeatures.IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED.getFeatureId -> RecapFeatures.PREDICTED_IS_PROFILE_CLICKED_AND_PROFILE_ENGAGED.getFeatureId, + RecapFeatures.IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR.getFeatureId -> RecapFeatures.PREDICTED_IS_REPLIED_REPLY_ENGAGED_BY_AUTHOR.getFeatureId + ) + + val labelFeatureNames: Seq[String] = scoreFeatureIdsMap.keys.toSeq +} diff --git a/src/scala/com/twitter/timelines/prediction/features/request_context/BUILD b/src/scala/com/twitter/timelines/prediction/features/request_context/BUILD new file mode 100644 index 000000000..6fc497bf3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/request_context/BUILD @@ -0,0 +1,9 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/request_context/RequestContextFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/request_context/RequestContextFeatures.scala new file mode 100644 index 000000000..a7dd28852 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/request_context/RequestContextFeatures.scala @@ -0,0 +1,57 @@ +package com.twitter.timelines.prediction.features.request_context + +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api.Feature._ +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import scala.collection.JavaConverters._ + +object RequestContextFeatures { + val COUNTRY_CODE = + new Text("request_context.country_code", Set(PrivateCountryOrRegion, InferredCountry).asJava) + val LANGUAGE_CODE = new Text( + "request_context.language_code", + Set(GeneralSettings, ProvidedLanguage, InferredLanguage).asJava) + val REQUEST_PROVENANCE = new Text("request_context.request_provenance", Set(AppUsage).asJava) + val DISPLAY_WIDTH = new Continuous("request_context.display_width", Set(OtherDeviceInfo).asJava) + val DISPLAY_HEIGHT = new Continuous("request_context.display_height", Set(OtherDeviceInfo).asJava) + val DISPLAY_DPI = new Continuous("request_context.display_dpi", Set(OtherDeviceInfo).asJava) + + // the following features are not Continuous Features because for e.g. continuity between + // 23 and 0 hours cannot be handled that way. instead, we will treat each slice of hours/days + // independently, like a set of sparse binary features. + val TIMESTAMP_GMT_HOUR = + new Discrete("request_context.timestamp_gmt_hour", Set(PrivateTimestamp).asJava) + val TIMESTAMP_GMT_DOW = + new Discrete("request_context.timestamp_gmt_dow", Set(PrivateTimestamp).asJava) + + val IS_GET_INITIAL = new Binary("request_context.is_get_initial") + val IS_GET_MIDDLE = new Binary("request_context.is_get_middle") + val IS_GET_NEWER = new Binary("request_context.is_get_newer") + val IS_GET_OLDER = new Binary("request_context.is_get_older") + + // the following features are not Binary Features because the source field is Option[Boolean], + // and we want to distinguish Some(false) from None. None will be converted to -1. + val IS_POLLING = new Discrete("request_context.is_polling") + val IS_SESSION_START = new Discrete("request_context.is_session_start") + + // Helps distinguish requests from "home" vs "home_latest" (reverse chron home view). + val TIMELINE_KIND = new Text("request_context.timeline_kind") + + val featureContext = new FeatureContext( + COUNTRY_CODE, + LANGUAGE_CODE, + REQUEST_PROVENANCE, + DISPLAY_WIDTH, + DISPLAY_HEIGHT, + DISPLAY_DPI, + TIMESTAMP_GMT_HOUR, + TIMESTAMP_GMT_DOW, + IS_GET_INITIAL, + IS_GET_MIDDLE, + IS_GET_NEWER, + IS_GET_OLDER, + IS_POLLING, + IS_SESSION_START, + TIMELINE_KIND + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/simcluster/BUILD b/src/scala/com/twitter/timelines/prediction/features/simcluster/BUILD new file mode 100644 index 000000000..ec194353b --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/simcluster/BUILD @@ -0,0 +1,13 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/timelines/suggests/common:record-scala", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/conversion:for-timelines", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterFeatures.scala new file mode 100644 index 000000000..4d2b4db81 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterFeatures.scala @@ -0,0 +1,61 @@ +package com.twitter.timelines.prediction.features.simcluster + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.Feature._ +import com.twitter.simclusters_v2.thriftscala.ClustersUserIsInterestedIn +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import scala.collection.JavaConverters._ + +class SimclusterFeaturesHelper(statsReceiver: StatsReceiver) { + import SimclusterFeatures._ + + private[this] val scopedStatsReceiver = statsReceiver.scope(getClass.getSimpleName) + private[this] val invalidSimclusterModelVersion = scopedStatsReceiver + .counter("invalidSimclusterModelVersion") + + def fromUserClusterInterestsPair( + userInterestClustersPair: (Long, ClustersUserIsInterestedIn) + ): Option[SimclusterFeatures] = { + val (userId, userInterestClusters) = userInterestClustersPair + if (userInterestClusters.knownForModelVersion == SIMCLUSTER_MODEL_VERSION) { + val userInterestClustersFavScores = for { + (clusterId, scores) <- userInterestClusters.clusterIdToScores + favScore <- scores.favScore + } yield (clusterId.toString, favScore) + Some( + SimclusterFeatures( + userId, + userInterestClusters.knownForModelVersion, + userInterestClustersFavScores.toMap + ) + ) + } else { + // We maintain this counter to make sure that the hardcoded modelVersion we are using is correct. + invalidSimclusterModelVersion.incr + None + } + } +} + +object SimclusterFeatures { + // Check http://go/simclustersv2runbook for production versions + // Our models are trained for this specific model version only. + val SIMCLUSTER_MODEL_VERSION = "20M_145K_dec11" + val prefix = s"simcluster.v2.$SIMCLUSTER_MODEL_VERSION" + + val SIMCLUSTER_USER_INTEREST_CLUSTER_SCORES = new SparseContinuous( + s"$prefix.user_interest_cluster_scores", + Set(EngagementScore, InferredInterests).asJava + ) + val SIMCLUSTER_USER_INTEREST_CLUSTER_IDS = new SparseBinary( + s"$prefix.user_interest_cluster_ids", + Set(InferredInterests).asJava + ) + val SIMCLUSTER_MODEL_VERSION_METADATA = new Text("meta.simcluster_version") +} + +case class SimclusterFeatures( + userId: Long, + modelVersion: String, + interestClusterScoresMap: Map[String, Double]) diff --git a/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterTweetFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterTweetFeatures.scala new file mode 100644 index 000000000..355a89c22 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclusterTweetFeatures.scala @@ -0,0 +1,150 @@ +package com.twitter.timelines.prediction.features.simcluster + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.{Feature, FeatureContext} +import com.twitter.ml.api.Feature.{Continuous, SparseBinary, SparseContinuous} +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.timelines.suggests.common.record.thriftscala.SuggestionRecord +import scala.collection.JavaConverters._ + +class SimclusterTweetFeatures(statsReceiver: StatsReceiver) extends CombineCountsBase { + import SimclusterTweetFeatures._ + + private[this] val scopedStatsReceiver = statsReceiver.scope(getClass.getSimpleName) + private[this] val invalidSimclusterModelVersion = scopedStatsReceiver + .counter("invalidSimclusterModelVersion") + private[this] val getFeaturesFromOverlappingSimclusterIdsCount = scopedStatsReceiver + .counter("getFeaturesFromOverlappingSimclusterIdsCount") + private[this] val emptySimclusterMaps = scopedStatsReceiver + .counter("emptySimclusterMaps") + private[this] val nonOverlappingSimclusterMaps = scopedStatsReceiver + .counter("nonOverlappingSimclusterMaps") + + // Parameters required by CombineCountsBase + override val topK: Int = 5 + override val hardLimit: Option[Int] = None + override val precomputedCountFeatures: Seq[Feature[_]] = Seq( + SIMCLUSTER_TWEET_TOPK_SORT_BY_TWEET_SCORE, + SIMCLUSTER_TWEET_TOPK_SORT_BY_COMBINED_SCORE + ) + + private def getFeaturesFromOverlappingSimclusterIds( + userSimclustersInterestedInMap: Map[String, Double], + tweetSimclustersTopKMap: Map[String, Double] + ): Map[Feature[_], List[Double]] = { + getFeaturesFromOverlappingSimclusterIdsCount.incr + if (userSimclustersInterestedInMap.isEmpty || tweetSimclustersTopKMap.isEmpty) { + emptySimclusterMaps.incr + Map.empty + } else { + val overlappingSimclusterIds = + userSimclustersInterestedInMap.keySet intersect tweetSimclustersTopKMap.keySet + if (overlappingSimclusterIds.isEmpty) { + nonOverlappingSimclusterMaps.incr + Map.empty + } else { + val (combinedScores, tweetScores) = overlappingSimclusterIds.map { id => + val tweetScore = tweetSimclustersTopKMap.getOrElse(id, 0.0) + val combinedScore = userSimclustersInterestedInMap.getOrElse(id, 0.0) * tweetScore + (combinedScore, tweetScore) + }.unzip + Map( + SIMCLUSTER_TWEET_TOPK_SORT_BY_COMBINED_SCORE -> combinedScores.toList, + SIMCLUSTER_TWEET_TOPK_SORT_BY_TWEET_SCORE -> tweetScores.toList + ) + } + } + } + + def getCountFeaturesValuesMap( + suggestionRecord: SuggestionRecord, + simclustersTweetTopKMap: Map[String, Double] + ): Map[Feature[_], List[Double]] = { + val userSimclustersInterestedInMap = formatUserSimclustersInterestedIn(suggestionRecord) + + val tweetSimclustersTopKMap = formatTweetSimclustersTopK(simclustersTweetTopKMap) + + getFeaturesFromOverlappingSimclusterIds(userSimclustersInterestedInMap, tweetSimclustersTopKMap) + } + + def filterByModelVersion( + simclustersMapOpt: Option[Map[String, Double]] + ): Option[Map[String, Double]] = { + simclustersMapOpt.flatMap { simclustersMap => + val filteredSimclustersMap = simclustersMap.filter { + case (clusterId, score) => + // The clusterId format is ModelVersion.IntegerClusterId.ScoreType as specified at + // com.twitter.ml.featurestore.catalog.features.recommendations.SimClustersV2TweetTopClusters + clusterId.contains(SimclusterFeatures.SIMCLUSTER_MODEL_VERSION) + } + + // The assumption is that the simclustersMap will contain clusterIds with the same modelVersion. + // We maintain this counter to make sure that the hardcoded modelVersion we are using is correct. + if (simclustersMap.size > filteredSimclustersMap.size) { + invalidSimclusterModelVersion.incr + } + + if (filteredSimclustersMap.nonEmpty) Some(filteredSimclustersMap) else None + } + } + + val allFeatures: Seq[Feature[_]] = outputFeaturesPostMerge.toSeq ++ Seq( + SIMCLUSTER_TWEET_TOPK_CLUSTER_IDS, + SIMCLUSTER_TWEET_TOPK_CLUSTER_SCORES) + val featureContext = new FeatureContext(allFeatures: _*) +} + +object SimclusterTweetFeatures { + val SIMCLUSTER_TWEET_TOPK_CLUSTER_IDS = new SparseBinary( + s"${SimclusterFeatures.prefix}.tweet_topk_cluster_ids", + Set(InferredInterests).asJava + ) + val SIMCLUSTER_TWEET_TOPK_CLUSTER_SCORES = new SparseContinuous( + s"${SimclusterFeatures.prefix}.tweet_topk_cluster_scores", + Set(EngagementScore, InferredInterests).asJava + ) + + val SIMCLUSTER_TWEET_TOPK_CLUSTER_ID = + TypedAggregateGroup.sparseFeature(SIMCLUSTER_TWEET_TOPK_CLUSTER_IDS) + + val SIMCLUSTER_TWEET_TOPK_SORT_BY_TWEET_SCORE = new Continuous( + s"${SimclusterFeatures.prefix}.tweet_topk_sort_by_tweet_score", + Set(EngagementScore, InferredInterests).asJava + ) + + val SIMCLUSTER_TWEET_TOPK_SORT_BY_COMBINED_SCORE = new Continuous( + s"${SimclusterFeatures.prefix}.tweet_topk_sort_by_combined_score", + Set(EngagementScore, InferredInterests).asJava + ) + + def formatUserSimclustersInterestedIn(suggestionRecord: SuggestionRecord): Map[String, Double] = { + suggestionRecord.userSimclustersInterestedIn + .map { clustersUserIsInterestedIn => + if (clustersUserIsInterestedIn.knownForModelVersion == SimclusterFeatures.SIMCLUSTER_MODEL_VERSION) { + clustersUserIsInterestedIn.clusterIdToScores.collect { + case (clusterId, scores) if scores.favScore.isDefined => + (clusterId.toString, scores.favScore.get) + } + } else Map.empty[String, Double] + }.getOrElse(Map.empty[String, Double]) + .toMap + } + + def formatTweetSimclustersTopK( + simclustersTweetTopKMap: Map[String, Double] + ): Map[String, Double] = { + simclustersTweetTopKMap.collect { + case (clusterId, score) => + // The clusterId format is as specified at + // com.twitter.ml.featurestore.catalog.features.recommendations.SimClustersV2TweetTopClusters + // and we want to extract the IntegerClusterId. + // The split function takes a regex; therefore, we need to escape . and we also need to escape + // \ since they are both special characters. Hence, the double \\. + val clusterIdSplit = clusterId.split("\\.") + val integerClusterId = clusterIdSplit(1) // The IntegerClusterId is at position 1. + (integerClusterId, score) + } + } +} diff --git a/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclustersScoresFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclustersScoresFeatures.scala new file mode 100644 index 000000000..0629636c0 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/simcluster/SimclustersScoresFeatures.scala @@ -0,0 +1,43 @@ +package com.twitter.timelines.prediction.features.simcluster + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType.SemanticcoreClassification +import com.twitter.ml.api.Feature +import com.twitter.ml.api.Feature.Continuous +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion.CombineCountsBase +import scala.collection.JavaConverters._ + +object SimclustersScoresFeatures extends CombineCountsBase { + override def topK: Int = 2 + + override def hardLimit: Option[Int] = Some(20) + + val prefix = s"recommendations.sim_clusters_scores" + val TOPIC_CONSUMER_TWEET_EMBEDDING_Cs = new Continuous( + s"$prefix.localized_topic_consumer_tweet_embedding_cosine_similarity", + Set(SemanticcoreClassification).asJava) + val TOPIC_PRODUCER_TWEET_EMBEDDING_Cs = new Continuous( + s"$prefix.topic_producer_tweet_embedding_cosine_similarity", + Set(SemanticcoreClassification).asJava) + val USER_TOPIC_CONSUMER_TWEET_EMBEDDING_COSINE_SIM = new Continuous( + s"$prefix.user_interested_in_localized_topic_consumer_embedding_cosine_similarity", + Set(SemanticcoreClassification).asJava) + val USER_TOPIC_CONSUMER_TWEET_EMBEDDING_DOT_PRODUCT = new Continuous( + s"$prefix.user_interested_in_localized_topic_consumer_embedding_dot_product", + Set(SemanticcoreClassification).asJava) + val USER_TOPIC_PRODUCER_TWEET_EMBEDDING_COSINE_SIM = new Continuous( + s"$prefix.user_interested_in_localized_topic_producer_embedding_cosine_similarity", + Set(SemanticcoreClassification).asJava) + val USER_TOPIC_PRODUCER_TWEET_EMBEDDING_DOT_PRODUCT = new Continuous( + s"$prefix.user_interested_in_localized_topic_producer_embedding_dot_product", + Set(SemanticcoreClassification).asJava) + + override def precomputedCountFeatures: Seq[Feature[_]] = + Seq( + TOPIC_CONSUMER_TWEET_EMBEDDING_Cs, + TOPIC_PRODUCER_TWEET_EMBEDDING_Cs, + USER_TOPIC_CONSUMER_TWEET_EMBEDDING_COSINE_SIM, + USER_TOPIC_CONSUMER_TWEET_EMBEDDING_DOT_PRODUCT, + USER_TOPIC_PRODUCER_TWEET_EMBEDDING_COSINE_SIM, + USER_TOPIC_PRODUCER_TWEET_EMBEDDING_DOT_PRODUCT + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/socialproof/BUILD b/src/scala/com/twitter/timelines/prediction/features/socialproof/BUILD new file mode 100644 index 000000000..0c00b1e5b --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/socialproof/BUILD @@ -0,0 +1,15 @@ +scala_library( + name = "socialproof_features", + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/ibm/icu:icu4j", + "src/java/com/twitter/ml/api:api-base", + "src/scala/com/twitter/ml/api/util", + "src/scala/com/twitter/timelines/util", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/timelines/socialproof:socialproof-scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/socialproof/SocialProofFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/socialproof/SocialProofFeatures.scala new file mode 100644 index 000000000..163ba7efa --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/socialproof/SocialProofFeatures.scala @@ -0,0 +1,172 @@ +package com.twitter.timelines.prediction.features.socialproof + +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature.Binary +import com.twitter.ml.api.Feature.Continuous +import com.twitter.ml.api.Feature.SparseBinary +import com.twitter.ml.api.util.FDsl._ +import com.twitter.timelines.prediction.features.socialproof.SocialProofDataRecordFeatures._ +import com.twitter.timelines.socialproof.thriftscala.SocialProof +import com.twitter.timelines.socialproof.v1.thriftscala.SocialProofType +import com.twitter.timelines.util.CommonTypes.UserId +import scala.collection.JavaConverters._ +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ + +abstract class SocialProofUserGroundTruth(userIds: Seq[UserId], count: Int) { + require( + count >= userIds.size, + "count must be equal to or greater than the number of entries in userIds" + ) + // Using Double as the return type to make it more convenient for these values to be used as + // ML feature values. + val displayedUserCount: Double = userIds.size.toDouble + val undisplayedUserCount: Double = count - userIds.size.toDouble + val totalCount: Double = count.toDouble + + def featureDisplayedUsers: SparseBinary + def featureDisplayedUserCount: Continuous + def featureUndisplayedUserCount: Continuous + def featureTotalUserCount: Continuous + + def setFeatures(rec: DataRecord): Unit = { + rec.setFeatureValue(featureDisplayedUsers, toStringSet(userIds)) + rec.setFeatureValue(featureDisplayedUserCount, displayedUserCount) + rec.setFeatureValue(featureUndisplayedUserCount, undisplayedUserCount) + rec.setFeatureValue(featureTotalUserCount, totalCount) + } + protected def toStringSet(value: Seq[Long]): Set[String] = { + value.map(_.toString).toSet + } +} + +case class FavoritedBySocialProofUserGroundTruth(userIds: Seq[UserId] = Seq.empty, count: Int = 0) + extends SocialProofUserGroundTruth(userIds, count) { + + override val featureDisplayedUsers = SocialProofDisplayedFavoritedByUsers + override val featureDisplayedUserCount = SocialProofDisplayedFavoritedByUserCount + override val featureUndisplayedUserCount = SocialProofUndisplayedFavoritedByUserCount + override val featureTotalUserCount = SocialProofTotalFavoritedByUserCount +} + +case class RetweetedBySocialProofUserGroundTruth(userIds: Seq[UserId] = Seq.empty, count: Int = 0) + extends SocialProofUserGroundTruth(userIds, count) { + + override val featureDisplayedUsers = SocialProofDisplayedRetweetedByUsers + override val featureDisplayedUserCount = SocialProofDisplayedRetweetedByUserCount + override val featureUndisplayedUserCount = SocialProofUndisplayedRetweetedByUserCount + override val featureTotalUserCount = SocialProofTotalRetweetedByUserCount +} + +case class RepliedBySocialProofUserGroundTruth(userIds: Seq[UserId] = Seq.empty, count: Int = 0) + extends SocialProofUserGroundTruth(userIds, count) { + + override val featureDisplayedUsers = SocialProofDisplayedRepliedByUsers + override val featureDisplayedUserCount = SocialProofDisplayedRepliedByUserCount + override val featureUndisplayedUserCount = SocialProofUndisplayedRepliedByUserCount + override val featureTotalUserCount = SocialProofTotalRepliedByUserCount +} + +case class SocialProofFeatures( + hasSocialProof: Boolean, + favoritedBy: FavoritedBySocialProofUserGroundTruth = FavoritedBySocialProofUserGroundTruth(), + retweetedBy: RetweetedBySocialProofUserGroundTruth = RetweetedBySocialProofUserGroundTruth(), + repliedBy: RepliedBySocialProofUserGroundTruth = RepliedBySocialProofUserGroundTruth()) { + + def setFeatures(dataRecord: DataRecord): Unit = + if (hasSocialProof) { + dataRecord.setFeatureValue(HasSocialProof, hasSocialProof) + favoritedBy.setFeatures(dataRecord) + retweetedBy.setFeatures(dataRecord) + repliedBy.setFeatures(dataRecord) + } +} + +object SocialProofFeatures { + def apply(socialProofs: Seq[SocialProof]): SocialProofFeatures = + socialProofs.foldLeft(SocialProofFeatures(hasSocialProof = socialProofs.nonEmpty))( + (prevFeatures, socialProof) => { + val userIds = socialProof.v1.userIds + val count = socialProof.v1.count + socialProof.v1.socialProofType match { + case SocialProofType.FavoritedBy => + prevFeatures.copy(favoritedBy = FavoritedBySocialProofUserGroundTruth(userIds, count)) + case SocialProofType.RetweetedBy => + prevFeatures.copy(retweetedBy = RetweetedBySocialProofUserGroundTruth(userIds, count)) + case SocialProofType.RepliedBy => + prevFeatures.copy(repliedBy = RepliedBySocialProofUserGroundTruth(userIds, count)) + case _ => + prevFeatures // skip silently instead of breaking jobs, since this isn't used yet + } + }) +} + +object SocialProofDataRecordFeatures { + val HasSocialProof = new Binary("recap.social_proof.has_social_proof") + + val SocialProofDisplayedFavoritedByUsers = new SparseBinary( + "recap.social_proof.list.displayed.favorited_by", + Set(UserId, PublicLikes, PrivateLikes).asJava + ) + val SocialProofDisplayedFavoritedByUserCount = new Continuous( + "recap.social_proof.count.displayed.favorited_by", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val SocialProofUndisplayedFavoritedByUserCount = new Continuous( + "recap.social_proof.count.undisplayed.favorited_by", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val SocialProofTotalFavoritedByUserCount = new Continuous( + "recap.social_proof.count.total.favorited_by", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + + val SocialProofDisplayedRetweetedByUsers = new SparseBinary( + "recap.social_proof.list.displayed.retweeted_by", + Set(UserId, PublicRetweets, PrivateRetweets).asJava + ) + val SocialProofDisplayedRetweetedByUserCount = new Continuous( + "recap.social_proof.count.displayed.retweeted_by", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val SocialProofUndisplayedRetweetedByUserCount = new Continuous( + "recap.social_proof.count.undisplayed.retweeted_by", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val SocialProofTotalRetweetedByUserCount = new Continuous( + "recap.social_proof.count.total.retweeted_by", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + + val SocialProofDisplayedRepliedByUsers = new SparseBinary( + "recap.social_proof.list.displayed.replied_by", + Set(UserId, PublicReplies, PrivateReplies).asJava + ) + val SocialProofDisplayedRepliedByUserCount = new Continuous( + "recap.social_proof.count.displayed.replied_by", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val SocialProofUndisplayedRepliedByUserCount = new Continuous( + "recap.social_proof.count.undisplayed.replied_by", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val SocialProofTotalRepliedByUserCount = new Continuous( + "recap.social_proof.count.total.replied_by", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + + val AllFeatures = Seq( + HasSocialProof, + SocialProofDisplayedFavoritedByUsers, + SocialProofDisplayedFavoritedByUserCount, + SocialProofUndisplayedFavoritedByUserCount, + SocialProofTotalFavoritedByUserCount, + SocialProofDisplayedRetweetedByUsers, + SocialProofDisplayedRetweetedByUserCount, + SocialProofUndisplayedRetweetedByUserCount, + SocialProofTotalRetweetedByUserCount, + SocialProofDisplayedRepliedByUsers, + SocialProofDisplayedRepliedByUserCount, + SocialProofUndisplayedRepliedByUserCount, + SocialProofTotalRepliedByUserCount + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/time_features/BUILD b/src/scala/com/twitter/timelines/prediction/features/time_features/BUILD new file mode 100644 index 000000000..b5c49af36 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/time_features/BUILD @@ -0,0 +1,10 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/timelines/time_features:time_features-scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/time_features/TimeDataRecordFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/time_features/TimeDataRecordFeatures.scala new file mode 100644 index 000000000..b398203c3 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/time_features/TimeDataRecordFeatures.scala @@ -0,0 +1,111 @@ +package com.twitter.timelines.prediction.features.time_features + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import com.twitter.ml.api.Feature._ +import scala.collection.JavaConverters._ +import com.twitter.util.Duration +import com.twitter.conversions.DurationOps._ + +object TimeDataRecordFeatures { + val TIME_BETWEEN_NON_POLLING_REQUESTS_AVG = new Continuous( + "time_features.time_between_non_polling_requests_avg", + Set(PrivateTimestamp).asJava + ) + val TIME_SINCE_TWEET_CREATION = new Continuous("time_features.time_since_tweet_creation") + val TIME_SINCE_SOURCE_TWEET_CREATION = new Continuous( + "time_features.time_since_source_tweet_creation" + ) + val TIME_SINCE_LAST_NON_POLLING_REQUEST = new Continuous( + "time_features.time_since_last_non_polling_request", + Set(PrivateTimestamp).asJava + ) + val NON_POLLING_REQUESTS_SINCE_TWEET_CREATION = new Continuous( + "time_features.non_polling_requests_since_tweet_creation", + Set(PrivateTimestamp).asJava + ) + val TWEET_AGE_RATIO = new Continuous("time_features.tweet_age_ratio") + val IS_TWEET_RECYCLED = new Binary("time_features.is_tweet_recycled") + // Last Engagement features + val LAST_FAVORITE_SINCE_CREATION_HRS = new Continuous( + "time_features.earlybird.last_favorite_since_creation_hrs", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val LAST_RETWEET_SINCE_CREATION_HRS = new Continuous( + "time_features.earlybird.last_retweet_since_creation_hrs", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val LAST_REPLY_SINCE_CREATION_HRS = new Continuous( + "time_features.earlybird.last_reply_since_creation_hrs", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val LAST_QUOTE_SINCE_CREATION_HRS = new Continuous( + "time_features.earlybird.last_quote_since_creation_hrs", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val TIME_SINCE_LAST_FAVORITE_HRS = new Continuous( + "time_features.earlybird.time_since_last_favorite", + Set(CountOfPrivateLikes, CountOfPublicLikes).asJava + ) + val TIME_SINCE_LAST_RETWEET_HRS = new Continuous( + "time_features.earlybird.time_since_last_retweet", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + val TIME_SINCE_LAST_REPLY_HRS = new Continuous( + "time_features.earlybird.time_since_last_reply", + Set(CountOfPrivateReplies, CountOfPublicReplies).asJava + ) + val TIME_SINCE_LAST_QUOTE_HRS = new Continuous( + "time_features.earlybird.time_since_last_quote", + Set(CountOfPrivateRetweets, CountOfPublicRetweets).asJava + ) + + val TIME_SINCE_VIEWER_ACCOUNT_CREATION_SECS = + new Continuous( + "time_features.time_since_viewer_account_creation_secs", + Set(AccountCreationTime, AgeOfAccount).asJava) + + val USER_ID_IS_SNOWFLAKE_ID = + new Binary("time_features.time_user_id_is_snowflake_id", Set(UserType).asJava) + + val IS_30_DAY_NEW_USER = + new Binary("time_features.is_day_30_new_user", Set(AccountCreationTime, AgeOfAccount).asJava) + val IS_12_MONTH_NEW_USER = + new Binary("time_features.is_month_12_new_user", Set(AccountCreationTime, AgeOfAccount).asJava) + val ACCOUNT_AGE_INTERVAL = + new Discrete("time_features.account_age_interval", Set(AgeOfAccount).asJava) +} + +object AccountAgeInterval extends Enumeration { + val LTE_1_DAY, GT_1_DAY_LTE_5_DAY, GT_5_DAY_LTE_14_DAY, GT_14_DAY_LTE_30_DAY = Value + + def fromDuration(accountAge: Duration): Option[AccountAgeInterval.Value] = { + accountAge match { + case a if (a <= 1.day) => Some(LTE_1_DAY) + case a if (1.day < a && a <= 5.days) => Some(GT_1_DAY_LTE_5_DAY) + case a if (5.days < a && a <= 14.days) => Some(GT_5_DAY_LTE_14_DAY) + case a if (14.days < a && a <= 30.days) => Some(GT_14_DAY_LTE_30_DAY) + case _ => None + } + } +} + +case class TimeFeatures( + isTweetRecycled: Boolean, + timeSinceTweetCreation: Double, + isDay30NewUser: Boolean, + isMonth12NewUser: Boolean, + timeSinceSourceTweetCreation: Double, // same as timeSinceTweetCreation for non-retweets + timeSinceViewerAccountCreationSecs: Option[Double], + timeBetweenNonPollingRequestsAvg: Option[Double] = None, + timeSinceLastNonPollingRequest: Option[Double] = None, + nonPollingRequestsSinceTweetCreation: Option[Double] = None, + tweetAgeRatio: Option[Double] = None, + lastFavSinceCreationHrs: Option[Double] = None, + lastRetweetSinceCreationHrs: Option[Double] = None, + lastReplySinceCreationHrs: Option[Double] = None, + lastQuoteSinceCreationHrs: Option[Double] = None, + timeSinceLastFavoriteHrs: Option[Double] = None, + timeSinceLastRetweetHrs: Option[Double] = None, + timeSinceLastReplyHrs: Option[Double] = None, + timeSinceLastQuoteHrs: Option[Double] = None, + accountAgeInterval: Option[AccountAgeInterval.Value] = None) diff --git a/src/scala/com/twitter/timelines/prediction/features/two_hop_features/BUILD b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/BUILD new file mode 100644 index 000000000..a4ad0eabf --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/BUILD @@ -0,0 +1,10 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "graph-feature-service/src/main/thrift/com/twitter/graph_feature_service:graph_feature_service_thrift-scala", + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeatures.scala new file mode 100644 index 000000000..03a112578 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeatures.scala @@ -0,0 +1,93 @@ +package com.twitter.timelines.prediction.features.two_hop_features + +import com.twitter.graph_feature_service.thriftscala.EdgeType +import com.twitter.ml.api.Feature._ +import scala.collection.JavaConverters._ +import TwoHopFeaturesConfig.personalDataTypesMap + +object TwoHopFeaturesDescriptor { + val prefix = "two_hop" + val normalizedPostfix = "normalized" + val leftNodeDegreePostfix = "left_degree" + val rightNodeDegreePostfix = "right_degree" + + type TwoHopFeatureMap = Map[(EdgeType, EdgeType), Continuous] + type TwoHopFeatureNodeDegreeMap = Map[EdgeType, Continuous] + + def apply(edgeTypePairs: Seq[(EdgeType, EdgeType)]): TwoHopFeaturesDescriptor = { + new TwoHopFeaturesDescriptor(edgeTypePairs) + } +} + +class TwoHopFeaturesDescriptor(edgeTypePairs: Seq[(EdgeType, EdgeType)]) { + import TwoHopFeaturesDescriptor._ + + def getLeftEdge(edgeTypePair: (EdgeType, EdgeType)): EdgeType = { + edgeTypePair._1 + } + + def getLeftEdgeName(edgeTypePair: (EdgeType, EdgeType)): String = { + getLeftEdge(edgeTypePair).originalName.toLowerCase + } + + def getRightEdge(edgeTypePair: (EdgeType, EdgeType)): EdgeType = { + edgeTypePair._2 + } + + def getRightEdgeName(edgeTypePair: (EdgeType, EdgeType)): String = { + getRightEdge(edgeTypePair).originalName.toLowerCase + } + + val rawFeaturesMap: TwoHopFeatureMap = edgeTypePairs.map(edgeTypePair => { + val leftEdgeType = getLeftEdge(edgeTypePair) + val leftEdgeName = getLeftEdgeName(edgeTypePair) + val rightEdgeType = getRightEdge(edgeTypePair) + val rightEdgeName = getRightEdgeName(edgeTypePair) + val personalDataTypes = ( + personalDataTypesMap.getOrElse(leftEdgeType, Set.empty) ++ + personalDataTypesMap.getOrElse(rightEdgeType, Set.empty) + ).asJava + val rawFeature = new Continuous(s"$prefix.$leftEdgeName.$rightEdgeName", personalDataTypes) + edgeTypePair -> rawFeature + })(collection.breakOut) + + val leftNodeDegreeFeaturesMap: TwoHopFeatureNodeDegreeMap = edgeTypePairs.map(edgeTypePair => { + val leftEdgeType = getLeftEdge(edgeTypePair) + val leftEdgeName = getLeftEdgeName(edgeTypePair) + val personalDataTypes = personalDataTypesMap.getOrElse(leftEdgeType, Set.empty).asJava + val leftNodeDegreeFeature = + new Continuous(s"$prefix.$leftEdgeName.$leftNodeDegreePostfix", personalDataTypes) + leftEdgeType -> leftNodeDegreeFeature + })(collection.breakOut) + + val rightNodeDegreeFeaturesMap: TwoHopFeatureNodeDegreeMap = edgeTypePairs.map(edgeTypePair => { + val rightEdgeType = getRightEdge(edgeTypePair) + val rightEdgeName = getRightEdgeName(edgeTypePair) + val personalDataTypes = personalDataTypesMap.getOrElse(rightEdgeType, Set.empty).asJava + val rightNodeDegreeFeature = + new Continuous(s"$prefix.$rightEdgeName.$rightNodeDegreePostfix", personalDataTypes) + rightEdgeType -> rightNodeDegreeFeature + })(collection.breakOut) + + val normalizedFeaturesMap: TwoHopFeatureMap = edgeTypePairs.map(edgeTypePair => { + val leftEdgeType = getLeftEdge(edgeTypePair) + val leftEdgeName = getLeftEdgeName(edgeTypePair) + val rightEdgeType = getRightEdge(edgeTypePair) + val rightEdgeName = getRightEdgeName(edgeTypePair) + val personalDataTypes = ( + personalDataTypesMap.getOrElse(leftEdgeType, Set.empty) ++ + personalDataTypesMap.getOrElse(rightEdgeType, Set.empty) + ).asJava + val normalizedFeature = + new Continuous(s"$prefix.$leftEdgeName.$rightEdgeName.$normalizedPostfix", personalDataTypes) + edgeTypePair -> normalizedFeature + })(collection.breakOut) + + private val rawFeaturesSeq: Seq[Continuous] = rawFeaturesMap.values.toSeq + private val leftNodeDegreeFeaturesSeq: Seq[Continuous] = leftNodeDegreeFeaturesMap.values.toSeq + private val rightNodeDegreeFeaturesSeq: Seq[Continuous] = rightNodeDegreeFeaturesMap.values.toSeq + private val normalizedFeaturesSeq: Seq[Continuous] = normalizedFeaturesMap.values.toSeq + + val featuresSeq: Seq[Continuous] = + rawFeaturesSeq ++ leftNodeDegreeFeaturesSeq ++ rightNodeDegreeFeaturesSeq ++ normalizedFeaturesSeq +} diff --git a/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeaturesConfig.scala b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeaturesConfig.scala new file mode 100644 index 000000000..ece502e30 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/two_hop_features/TwoHopFeaturesConfig.scala @@ -0,0 +1,30 @@ +package com.twitter.timelines.prediction.features.two_hop_features + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType +import com.twitter.graph_feature_service.thriftscala.{EdgeType, FeatureType} + +object TwoHopFeaturesConfig { + val leftEdgeTypes = Seq(EdgeType.Following, EdgeType.Favorite, EdgeType.MutualFollow) + val rightEdgeTypes = Seq( + EdgeType.FollowedBy, + EdgeType.FavoritedBy, + EdgeType.RetweetedBy, + EdgeType.MentionedBy, + EdgeType.MutualFollow) + + val edgeTypePairs: Seq[(EdgeType, EdgeType)] = { + for (leftEdgeType <- leftEdgeTypes; rightEdgeType <- rightEdgeTypes) + yield (leftEdgeType, rightEdgeType) + } + + val featureTypes: Seq[FeatureType] = edgeTypePairs.map(pair => FeatureType(pair._1, pair._2)) + + val personalDataTypesMap: Map[EdgeType, Set[PersonalDataType]] = Map( + EdgeType.Following -> Set(PersonalDataType.CountOfFollowersAndFollowees), + EdgeType.Favorite -> Set( + PersonalDataType.CountOfPrivateLikes, + PersonalDataType.CountOfPublicLikes), + EdgeType.MutualFollow -> Set(PersonalDataType.CountOfFollowersAndFollowees), + EdgeType.FollowedBy -> Set(PersonalDataType.CountOfFollowersAndFollowees) + ) +} diff --git a/src/scala/com/twitter/timelines/prediction/features/user_health/BUILD b/src/scala/com/twitter/timelines/prediction/features/user_health/BUILD new file mode 100644 index 000000000..598e0c066 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/user_health/BUILD @@ -0,0 +1,10 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/timelines/author_features/user_health:thrift-scala", + ], +) diff --git a/src/scala/com/twitter/timelines/prediction/features/user_health/UserHealthFeatures.scala b/src/scala/com/twitter/timelines/prediction/features/user_health/UserHealthFeatures.scala new file mode 100644 index 000000000..7c8c7f8b1 --- /dev/null +++ b/src/scala/com/twitter/timelines/prediction/features/user_health/UserHealthFeatures.scala @@ -0,0 +1,23 @@ +package com.twitter.timelines.prediction.features.user_health + +import com.twitter.ml.api.Feature +import com.twitter.timelines.author_features.user_health.thriftscala.UserState +import com.twitter.dal.personal_data.thriftjava.PersonalDataType.{UserState => UserStatePDT} +import com.twitter.dal.personal_data.thriftjava.PersonalDataType._ +import scala.collection.JavaConverters._ + +object UserHealthFeatures { + val UserState = new Feature.Discrete("user_health.user_state", Set(UserStatePDT, UserType).asJava) + val IsLightMinusUser = + new Feature.Binary("user_health.is_light_minus_user", Set(UserStatePDT, UserType).asJava) + val AuthorState = + new Feature.Discrete("user_health.author_state", Set(UserStatePDT, UserType).asJava) + val NumAuthorFollowers = + new Feature.Continuous("author_health.num_followers", Set(CountOfFollowersAndFollowees).asJava) + val NumAuthorConnectDays = new Feature.Continuous("author_health.num_connect_days") + val NumAuthorConnect = new Feature.Continuous("author_health.num_connect") + + val IsUserVerifiedUnion = new Feature.Binary("user_account.is_user_verified_union") +} + +case class UserHealthFeatures(id: Long, userStateOpt: Option[UserState]) diff --git a/timelines/data_processing/ml_util/aggregation_framework/AggregateGroup.scala b/timelines/data_processing/ml_util/aggregation_framework/AggregateGroup.scala new file mode 100644 index 000000000..6797d838a --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/AggregateGroup.scala @@ -0,0 +1,124 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.ml.api._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetric +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.EasyMetric +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.MaxMetric +import com.twitter.timelines.data_processing.ml_util.transforms.OneToSomeTransform +import com.twitter.util.Duration +import java.lang.{Boolean => JBoolean} +import java.lang.{Long => JLong} +import scala.language.existentials + +/** + * A wrapper for [[com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup]] + * (see TypedAggregateGroup.scala) with some convenient syntactic sugar that avoids + * the user having to specify different groups for different types of features. + * Gets translated into multiple strongly typed TypedAggregateGroup(s) + * by the buildTypedAggregateGroups() method defined below. + * + * @param inputSource Source to compute this aggregate over + * @param preTransforms Sequence of [[ITransform]] that is applied to + * data records pre-aggregation (e.g. discretization, renaming) + * @param samplingTransformOpt Optional [[OneToSomeTransform]] that samples data record + * @param aggregatePrefix Prefix to use for naming resultant aggregate features + * @param keys Features to group by when computing the aggregates + * (e.g. USER_ID, AUTHOR_ID). These must be either discrete, string or sparse binary. + * Grouping by a sparse binary feature is different than grouping by a discrete or string + * feature. For example, if you have a sparse binary feature WORDS_IN_TWEET which is + * a set of all words in a tweet, then grouping by this feature generates a + * separate aggregate mean/count/etc for each value of the feature (each word), and + * not just a single aggregate count for different "sets of words" + * @param features Features to aggregate (e.g. blender_score or is_photo). + * @param labels Labels to cross the features with to make pair features, if any. + * @param metrics Aggregation metrics to compute (e.g. count, mean) + * @param halfLives Half lives to use for the aggregations, to be crossed with the above. + * use Duration.Top for "forever" aggregations over an infinite time window (no decay). + * @param outputStore Store to output this aggregate to + * @param includeAnyFeature Aggregate label counts for any feature value + * @param includeAnyLabel Aggregate feature counts for any label value (e.g. all impressions) + * @param includeTimestampFeature compute max aggregate on timestamp feature + * @param aggExclusionRegex Sequence of Regexes, which define features to + */ +case class AggregateGroup( + inputSource: AggregateSource, + aggregatePrefix: String, + keys: Set[Feature[_]], + features: Set[Feature[_]], + labels: Set[_ <: Feature[JBoolean]], + metrics: Set[EasyMetric], + halfLives: Set[Duration], + outputStore: AggregateStore, + preTransforms: Seq[OneToSomeTransform] = Seq.empty, + includeAnyFeature: Boolean = true, + includeAnyLabel: Boolean = true, + includeTimestampFeature: Boolean = false, + aggExclusionRegex: Seq[String] = Seq.empty) { + + private def toStrongType[T]( + metrics: Set[EasyMetric], + features: Set[Feature[_]], + featureType: FeatureType + ): TypedAggregateGroup[_] = { + val underlyingMetrics: Set[AggregationMetric[T, _]] = + metrics.flatMap(_.forFeatureType[T](featureType)) + val underlyingFeatures: Set[Feature[T]] = features + .map(_.asInstanceOf[Feature[T]]) + + TypedAggregateGroup[T]( + inputSource = inputSource, + aggregatePrefix = aggregatePrefix, + keysToAggregate = keys, + featuresToAggregate = underlyingFeatures, + labels = labels, + metrics = underlyingMetrics, + halfLives = halfLives, + outputStore = outputStore, + preTransforms = preTransforms, + includeAnyFeature, + includeAnyLabel, + aggExclusionRegex + ) + } + + private def timestampTypedAggregateGroup: TypedAggregateGroup[_] = { + val metrics: Set[AggregationMetric[JLong, _]] = + Set(MaxMetric.forFeatureType[JLong](TypedAggregateGroup.timestampFeature.getFeatureType).get) + + TypedAggregateGroup[JLong]( + inputSource = inputSource, + aggregatePrefix = aggregatePrefix, + keysToAggregate = keys, + featuresToAggregate = Set(TypedAggregateGroup.timestampFeature), + labels = Set.empty, + metrics = metrics, + halfLives = Set(Duration.Top), + outputStore = outputStore, + preTransforms = preTransforms, + includeAnyFeature = false, + includeAnyLabel = true, + aggExclusionRegex = Seq.empty + ) + } + + def buildTypedAggregateGroups(): List[TypedAggregateGroup[_]] = { + val typedAggregateGroupsList = { + if (features.isEmpty) { + List(toStrongType(metrics, features, FeatureType.BINARY)) + } else { + features + .groupBy(_.getFeatureType()) + .toList + .map { + case (featureType, features) => + toStrongType(metrics, features, featureType) + } + } + } + + val optionalTimestampTypedAggregateGroup = + if (includeTimestampFeature) List(timestampTypedAggregateGroup) else List() + + typedAggregateGroupsList ++ optionalTimestampTypedAggregateGroup + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/AggregateSource.scala b/timelines/data_processing/ml_util/aggregation_framework/AggregateSource.scala new file mode 100644 index 000000000..7fb239c65 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/AggregateSource.scala @@ -0,0 +1,9 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.ml.api.Feature +import java.lang.{Long => JLong} + +trait AggregateSource extends Serializable { + def name: String + def timestampFeature: Feature[JLong] +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/AggregateStore.scala b/timelines/data_processing/ml_util/aggregation_framework/AggregateStore.scala new file mode 100644 index 000000000..1c09b33f0 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/AggregateStore.scala @@ -0,0 +1,5 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +trait AggregateStore extends Serializable { + def name: String +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/AggregationConfig.scala b/timelines/data_processing/ml_util/aggregation_framework/AggregationConfig.scala new file mode 100644 index 000000000..2b117ddbd --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/AggregationConfig.scala @@ -0,0 +1,5 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +trait AggregationConfig { + def aggregatesToCompute: Set[TypedAggregateGroup[_]] +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/AggregationKey.scala b/timelines/data_processing/ml_util/aggregation_framework/AggregationKey.scala new file mode 100644 index 000000000..c3aafef69 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/AggregationKey.scala @@ -0,0 +1,50 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.bijection.Bufferable +import com.twitter.bijection.Injection +import scala.util.Try + +/** + * Case class that represents the "grouping" key for any aggregate feature. + * Used by Summingbird to output aggregates to the key-value "store" using sumByKey() + * + * @discreteFeaturesById All discrete featureids (+ values) that are part of this key + * @textFeaturesById All string featureids (+ values) that are part of this key + * + * Example 1: the user aggregate features in aggregatesv1 all group by USER_ID, + * which is a discrete feature. When storing these features, the key would be: + * + * discreteFeaturesById = Map(hash(USER_ID) -> ), textFeaturesById = Map() + * + * Ex 2: If aggregating grouped by USER_ID, AUTHOR_ID, tweet link url, the key would be: + * + * discreteFeaturesById = Map(hash(USER_ID) -> , hash(AUTHOR_ID) -> ), + * textFeaturesById = Map(hash(URL_FEATURE) -> ) + * + * I could have just used a DataRecord for the key, but I wanted to make it strongly typed + * and only support grouping by discrete and string features, so using a case class instead. + * + * Re: efficiency, storing the hash of the feature in addition to just the feature value + * is somewhat more inefficient than only storing the feature value in the key, but it + * adds flexibility to group multiple types of aggregates in the same output store. If we + * decide this isn't a good tradeoff to make later, we can reverse/refactor this decision. + */ +case class AggregationKey( + discreteFeaturesById: Map[Long, Long], + textFeaturesById: Map[Long, String]) + +/** + * A custom injection for the above case class, + * so that Summingbird knows how to store it in Manhattan. + */ +object AggregationKeyInjection extends Injection[AggregationKey, Array[Byte]] { + /* Injection from tuple representation of AggregationKey to Array[Byte] */ + val featureMapsInjection: Injection[(Map[Long, Long], Map[Long, String]), Array[Byte]] = + Bufferable.injectionOf[(Map[Long, Long], Map[Long, String])] + + def apply(aggregationKey: AggregationKey): Array[Byte] = + featureMapsInjection(AggregationKey.unapply(aggregationKey).get) + + def invert(ab: Array[Byte]): Try[AggregationKey] = + featureMapsInjection.invert(ab).map(AggregationKey.tupled(_)) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/BUILD b/timelines/data_processing/ml_util/aggregation_framework/BUILD new file mode 100644 index 000000000..aff488116 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/BUILD @@ -0,0 +1,101 @@ +scala_library( + name = "common_types", + sources = ["*.scala"], + platform = "java8", + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/guava", + "3rdparty/jvm/com/twitter/algebird:bijection", + "3rdparty/jvm/com/twitter/algebird:core", + "3rdparty/jvm/com/twitter/algebird:util", + "3rdparty/jvm/com/twitter/bijection:core", + "3rdparty/jvm/com/twitter/bijection:json", + "3rdparty/jvm/com/twitter/bijection:macros", + "3rdparty/jvm/com/twitter/bijection:netty", + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/com/twitter/bijection:thrift", + "3rdparty/jvm/com/twitter/bijection:util", + "3rdparty/jvm/org/apache/thrift:libthrift", + "3rdparty/src/jvm/com/twitter/scalding:date", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/dal/client/dataset", + "src/scala/com/twitter/ml/api/util:datarecord", + "src/scala/com/twitter/scalding_internal/dalv2/vkvs", + "src/scala/com/twitter/scalding_internal/multiformat/format/keyval", + "src/scala/com/twitter/storehaus_internal/manhattan/config", + "src/scala/com/twitter/storehaus_internal/offline", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/dal/personal_data:personal_data-scala", + "src/thrift/com/twitter/ml/api:data-java", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "timelines/data_processing/ml_util/transforms", + "util/util-core:util-core-util", + ], +) + +target( + name = "common_online_stores", + dependencies = [ + "src/scala/com/twitter/storehaus_internal/memcache", + ], +) + +target( + name = "common_offline_stores", + dependencies = [ + "src/scala/com/twitter/storehaus_internal/manhattan", + ], +) + +target( + name = "user_job", + dependencies = [ + "timelines/data_processing/ml_util/aggregation_framework/job", + ], +) + +target( + name = "scalding", + dependencies = [ + "timelines/data_processing/ml_util/aggregation_framework/scalding", + ], +) + +target( + name = "conversion", + dependencies = [ + "timelines/data_processing/ml_util/aggregation_framework/conversion", + ], +) + +target( + name = "query", + dependencies = [ + "timelines/data_processing/ml_util/aggregation_framework/query", + ], +) + +target( + name = "heron", + dependencies = [ + "timelines/data_processing/ml_util/aggregation_framework/heron", + ], +) + +target( + dependencies = [ + ":common_offline_stores", + ":common_online_stores", + ":common_types", + ":conversion", + ":heron", + ":query", + ":scalding", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/DataRecordAggregationMonoid.scala b/timelines/data_processing/ml_util/aggregation_framework/DataRecordAggregationMonoid.scala new file mode 100644 index 000000000..bc37c8e05 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/DataRecordAggregationMonoid.scala @@ -0,0 +1,92 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.algebird.Monoid +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.SRichDataRecord +import scala.collection.mutable +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon._ + +/** + * Monoid to aggregate over DataRecord objects. + * + * @param aggregates Set of ''TypedAggregateGroup'' case classes* + * to compute using this monoid (see TypedAggregateGroup.scala) + */ +trait DataRecordMonoid extends Monoid[DataRecord] { + + val aggregates: Set[TypedAggregateGroup[_]] + + def zero(): DataRecord = new DataRecord + + /* + * Add two datarecords using this monoid. + * + * @param left Left datarecord to add + * @param right Right datarecord to add + * @return Sum of the two datarecords as a DataRecord + */ + def plus(left: DataRecord, right: DataRecord): DataRecord = { + val result = zero() + aggregates.foreach(_.mutatePlus(result, left, right)) + val leftTimestamp = getTimestamp(left) + val rightTimestamp = getTimestamp(right) + SRichDataRecord(result).setFeatureValue( + SharedFeatures.TIMESTAMP, + leftTimestamp.max(rightTimestamp) + ) + result + } +} + +case class DataRecordAggregationMonoid(aggregates: Set[TypedAggregateGroup[_]]) + extends DataRecordMonoid { + + private def sumBuffer(buffer: mutable.ArrayBuffer[DataRecord]): Unit = { + val bufferSum = zero() + buffer.toIterator.foreach { value => + val leftTimestamp = getTimestamp(bufferSum) + val rightTimestamp = getTimestamp(value) + aggregates.foreach(_.mutatePlus(bufferSum, bufferSum, value)) + SRichDataRecord(bufferSum).setFeatureValue( + SharedFeatures.TIMESTAMP, + leftTimestamp.max(rightTimestamp) + ) + } + + buffer.clear() + buffer += bufferSum + } + + /* + * Efficient batched aggregation of datarecords using + * this monoid + a buffer, for performance. + * + * @param dataRecordIter An iterator of datarecords to sum + * @return A datarecord option containing the sum + */ + override def sumOption(dataRecordIter: TraversableOnce[DataRecord]): Option[DataRecord] = { + if (dataRecordIter.isEmpty) { + None + } else { + var buffer = mutable.ArrayBuffer[DataRecord]() + val BatchSize = 1000 + + dataRecordIter.foreach { u => + if (buffer.size > BatchSize) sumBuffer(buffer) + buffer += u + } + + if (buffer.size > 1) sumBuffer(buffer) + Some(buffer(0)) + } + } +} + +/* + * This class is used when there is no need to use sumBuffer functionality, as in the case of + * online aggregation of datarecords where using a buffer on a small number of datarecords + * would add some performance overhead. + */ +case class DataRecordAggregationMonoidNoBuffer(aggregates: Set[TypedAggregateGroup[_]]) + extends DataRecordMonoid {} diff --git a/timelines/data_processing/ml_util/aggregation_framework/KeyedRecord.scala b/timelines/data_processing/ml_util/aggregation_framework/KeyedRecord.scala new file mode 100644 index 000000000..bb3096767 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/KeyedRecord.scala @@ -0,0 +1,27 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.ml.api.DataRecord + +/** + * Keyed record that is used to reprsent the aggregation type and its corresponding data record. + * + * @constructor creates a new keyed record. + * + * @param aggregateType the aggregate type + * @param record the data record associated with the key + **/ +case class KeyedRecord(aggregateType: AggregateType.Value, record: DataRecord) + +/** + * Keyed record map with multiple data record. + * + * @constructor creates a new keyed record map. + * + * @param aggregateType the aggregate type + * @param recordMap a map with key of type Long and value of type DataRecord + * where the key indicates the index and the value indicating the record + * + **/ +case class KeyedRecordMap( + aggregateType: AggregateType.Value, + recordMap: scala.collection.Map[Long, DataRecord]) diff --git a/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateInjections.scala b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateInjections.scala new file mode 100644 index 000000000..7ab1233c1 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateInjections.scala @@ -0,0 +1,46 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.dal.personal_data.thriftscala.PersonalDataType +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection.Batched +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection.JavaCompactThrift +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection.genericInjection +import com.twitter.summingbird.batch.BatchID +import scala.collection.JavaConverters._ + +object OfflineAggregateInjections { + val offlineDataRecordAggregateInjection: KeyValInjection[AggregationKey, (BatchID, DataRecord)] = + KeyValInjection( + genericInjection(AggregationKeyInjection), + Batched(JavaCompactThrift[DataRecord]) + ) + + private[aggregation_framework] def getPdts[T]( + aggregateGroups: Iterable[T], + featureExtractor: T => Iterable[Feature[_]] + ): Option[Set[PersonalDataType]] = { + val pdts: Set[PersonalDataType] = for { + group <- aggregateGroups.toSet[T] + feature <- featureExtractor(group) + pdtSet <- feature.getPersonalDataTypes.asSet().asScala + javaPdt <- pdtSet.asScala + scalaPdt <- PersonalDataType.get(javaPdt.getValue) + } yield { + scalaPdt + } + if (pdts.nonEmpty) Some(pdts) else None + } + + def getInjection( + aggregateGroups: Set[TypedAggregateGroup[_]] + ): KeyValInjection[AggregationKey, (BatchID, DataRecord)] = { + val keyPdts = getPdts[TypedAggregateGroup[_]](aggregateGroups, _.allOutputKeys) + val valuePdts = getPdts[TypedAggregateGroup[_]](aggregateGroups, _.allOutputFeatures) + KeyValInjection( + genericInjection(AggregationKeyInjection, keyPdts), + genericInjection(Batched(JavaCompactThrift[DataRecord]), valuePdts) + ) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateSource.scala b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateSource.scala new file mode 100644 index 000000000..116f553c4 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateSource.scala @@ -0,0 +1,21 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.dal.client.dataset.TimePartitionedDALDataset +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import java.lang.{Long => JLong} + +case class OfflineAggregateSource( + override val name: String, + override val timestampFeature: Feature[JLong], + scaldingHdfsPath: Option[String] = None, + scaldingSuffixType: Option[String] = None, + dalDataSet: Option[TimePartitionedDALDataset[DataRecord]] = None, + withValidation: Boolean = true) // context: https://jira.twitter.biz/browse/TQ-10618 + extends AggregateSource { + /* + * Th help transition callers to use DAL.read, we check that either the HDFS + * path is defined, or the dalDataset. Both options cannot be set at the same time. + */ + assert(!(scaldingHdfsPath.isDefined && dalDataSet.isDefined)) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateStore.scala b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateStore.scala new file mode 100644 index 000000000..0bba08a94 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/OfflineAggregateStore.scala @@ -0,0 +1,128 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.dal.client.dataset.KeyValDALDataset +import com.twitter.ml.api.DataRecord +import com.twitter.scalding.DateParser +import com.twitter.scalding.RichDate +import com.twitter.scalding_internal.multiformat.format.keyval.KeyVal +import com.twitter.storehaus_internal.manhattan._ +import com.twitter.storehaus_internal.util.ApplicationID +import com.twitter.storehaus_internal.util.DatasetName +import com.twitter.storehaus_internal.util.HDFSPath +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird.batch.Batcher +import com.twitter.summingbird_internal.runner.store_config._ +import java.util.TimeZone +import com.twitter.summingbird.batch.MillisecondBatcher + +/* + * Configuration common to all offline aggregate stores + * + * @param outputHdfsPathPrefix HDFS prefix to store all output aggregate types offline + * @param dummyAppId Dummy manhattan app id required by summingbird (unused) + * @param dummyDatasetPrefix Dummy manhattan dataset prefix required by summingbird (unused) + * @param startDate Start date for summingbird job to begin computing aggregates + */ +case class OfflineAggregateStoreCommonConfig( + outputHdfsPathPrefix: String, + dummyAppId: String, + dummyDatasetPrefix: String, + startDate: String) + +/** + * A trait inherited by any object that defines + * a HDFS prefix to write output data to. E.g. timelines has its own + * output prefix to write aggregates_v2 results, your team can create + * its own. + */ +trait OfflineStoreCommonConfig extends Serializable { + /* + * @param startDate Date to create config for + * @return OfflineAggregateStoreCommonConfig object with all config details for output populated + */ + def apply(startDate: String): OfflineAggregateStoreCommonConfig +} + +/** + * @param name Uniquely identifiable human-readable name for this output store + * @param startDate Start date for this output store from which aggregates should be computed + * @param commonConfig Provider of other common configuration details + * @param batchesToKeep Retention policy on output (number of batches to keep) + */ +abstract class OfflineAggregateStoreBase + extends OfflineStoreOnlyConfig[ManhattanROConfig] + with AggregateStore { + + override def name: String + def startDate: String + def commonConfig: OfflineStoreCommonConfig + def batchesToKeep: Int + def maxKvSourceFailures: Int + + val datedCommonConfig: OfflineAggregateStoreCommonConfig = commonConfig.apply(startDate) + val manhattan: ManhattanROConfig = ManhattanROConfig( + /* This is a sample config, will be replaced with production config later */ + HDFSPath(s"${datedCommonConfig.outputHdfsPathPrefix}/${name}"), + ApplicationID(datedCommonConfig.dummyAppId), + DatasetName(s"${datedCommonConfig.dummyDatasetPrefix}_${name}_1"), + com.twitter.storehaus_internal.manhattan.Adama + ) + + val batcherSize = 24 + val batcher: MillisecondBatcher = Batcher.ofHours(batcherSize) + + val startTime: RichDate = + RichDate(datedCommonConfig.startDate)(TimeZone.getTimeZone("UTC"), DateParser.default) + + val offline: ManhattanROConfig = manhattan +} + +/** + * Defines an aggregates store which is composed of DataRecords + * @param name Uniquely identifiable human-readable name for this output store + * @param startDate Start date for this output store from which aggregates should be computed + * @param commonConfig Provider of other common configuration details + * @param batchesToKeep Retention policy on output (number of batches to keep) + */ +case class OfflineAggregateDataRecordStore( + override val name: String, + override val startDate: String, + override val commonConfig: OfflineStoreCommonConfig, + override val batchesToKeep: Int = 7, + override val maxKvSourceFailures: Int = 0) + extends OfflineAggregateStoreBase { + + def toOfflineAggregateDataRecordStoreWithDAL( + dalDataset: KeyValDALDataset[KeyVal[AggregationKey, (BatchID, DataRecord)]] + ): OfflineAggregateDataRecordStoreWithDAL = + OfflineAggregateDataRecordStoreWithDAL( + name = name, + startDate = startDate, + commonConfig = commonConfig, + dalDataset = dalDataset, + maxKvSourceFailures = maxKvSourceFailures + ) +} + +trait withDALDataset { + def dalDataset: KeyValDALDataset[KeyVal[AggregationKey, (BatchID, DataRecord)]] +} + +/** + * Defines an aggregates store which is composed of DataRecords and writes using DAL. + * @param name Uniquely identifiable human-readable name for this output store + * @param startDate Start date for this output store from which aggregates should be computed + * @param commonConfig Provider of other common configuration details + * @param dalDataset The KeyValDALDataset for this output store + * @param batchesToKeep Unused, kept for interface compatibility. You must define a separate Oxpecker + * retention policy to maintain the desired number of versions. + */ +case class OfflineAggregateDataRecordStoreWithDAL( + override val name: String, + override val startDate: String, + override val commonConfig: OfflineStoreCommonConfig, + override val dalDataset: KeyValDALDataset[KeyVal[AggregationKey, (BatchID, DataRecord)]], + override val batchesToKeep: Int = -1, + override val maxKvSourceFailures: Int = 0) + extends OfflineAggregateStoreBase + with withDALDataset diff --git a/timelines/data_processing/ml_util/aggregation_framework/README.md b/timelines/data_processing/ml_util/aggregation_framework/README.md new file mode 100644 index 000000000..ea9a4b446 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/README.md @@ -0,0 +1,39 @@ +Overview +======== + + +The **aggregation framework** is a set of libraries and utilities that allows teams to flexibly +compute aggregate (counting) features in both batch and in real-time. Aggregate features can capture +historical interactions between on arbitrary entities (and sets thereof), conditional on provided features +and labels. + +These types of engineered aggregate features have proven to be highly impactful across different teams at Twitter. + + +What are some features we can compute? +-------------------------------------- + +The framework supports computing aggregate features on provided grouping keys. The only constraint is that these keys are sparse binary features (or are sets thereof). + +For example, a common use case is to calculate a user's past engagement history with various types of tweets (photo, video, retweets, etc.), specific authors, specific in-network engagers or any other entity the user has interacted with and that could provide signal. In this case, the underlying aggregation keys are `userId`, `(userId, authorId)` or `(userId, engagerId)`. + +In Timelines and MagicRecs, we also compute custom aggregate engagement counts on every `tweetId`. Similary, other aggregations are possible, perhaps on `advertiserId` or `mediaId` as long as the grouping key is sparse binary. + + +What implementations are supported? +----------------------------------- + +Offline, we support the daily batch processing of DataRecords containing all required input features to generate +aggregate features. These are then uploaded to Manhattan for online hydration. + +Online, we support the real-time aggregation of DataRecords through Storm with a backing memcache that can be queried +for the real-time aggregate features. + +Additional documentation exists in the [docs folder](docs) + + +Where is this used? +-------------------- + +The Home Timeline heavy ranker uses a varierty of both [batch and real time features](../../../../src/scala/com/twitter/timelines/prediction/common/aggregates/README.md) generated by this framework. +These features are also used for email and other recommendations. \ No newline at end of file diff --git a/timelines/data_processing/ml_util/aggregation_framework/StoreConfig.scala b/timelines/data_processing/ml_util/aggregation_framework/StoreConfig.scala new file mode 100644 index 000000000..703d5893c --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/StoreConfig.scala @@ -0,0 +1,68 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureType + +/** + * Convenience class to describe the stores that make up a particular type of aggregate. + * + * For example, as of 2018/07, user aggregates are generate by merging the individual + * "user_aggregates", "rectweet_user_aggregates", and, "twitter_wide_user_aggregates". + * + * @param storeNames Name of the stores. + * @param aggregateType Type of aggregate, usually differentiated by the aggregation key. + * @param shouldHash Used at TimelineRankingAggregatesUtil.extractSecondary when extracting the + * secondary key value. + */ +case class StoreConfig[T]( + storeNames: Set[String], + aggregateType: AggregateType.Value, + shouldHash: Boolean = false +)( + implicit storeMerger: StoreMerger) { + require(storeMerger.isValidToMerge(storeNames)) + + private val representativeStore = storeNames.head + + val aggregationKeyIds: Set[Long] = storeMerger.getAggregateKeys(representativeStore) + val aggregationKeyFeatures: Set[Feature[_]] = + storeMerger.getAggregateKeyFeatures(representativeStore) + val secondaryKeyFeatureOpt: Option[Feature[_]] = storeMerger.getSecondaryKey(representativeStore) +} + +trait StoreMerger { + def aggregationConfig: AggregationConfig + + def getAggregateKeyFeatures(storeName: String): Set[Feature[_]] = + aggregationConfig.aggregatesToCompute + .filter(_.outputStore.name == storeName) + .flatMap(_.keysToAggregate) + + def getAggregateKeys(storeName: String): Set[Long] = + TypedAggregateGroup.getKeyFeatureIds(getAggregateKeyFeatures(storeName)) + + def getSecondaryKey(storeName: String): Option[Feature[_]] = { + val keys = getAggregateKeyFeatures(storeName) + require(keys.size <= 2, "Only singleton or binary aggregation keys are supported.") + require(keys.contains(SharedFeatures.USER_ID), "USER_ID must be one of the aggregation keys.") + keys + .filterNot(_ == SharedFeatures.USER_ID) + .headOption + .map { possiblySparseKey => + if (possiblySparseKey.getFeatureType != FeatureType.SPARSE_BINARY) { + possiblySparseKey + } else { + TypedAggregateGroup.sparseFeature(possiblySparseKey) + } + } + } + + /** + * Stores may only be merged if they have the same aggregation key. + */ + def isValidToMerge(storeNames: Set[String]): Boolean = { + val expectedKeyOpt = storeNames.headOption.map(getAggregateKeys) + storeNames.forall(v => getAggregateKeys(v) == expectedKeyOpt.get) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/StoreRegister.scala b/timelines/data_processing/ml_util/aggregation_framework/StoreRegister.scala new file mode 100644 index 000000000..a7e9cd535 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/StoreRegister.scala @@ -0,0 +1,13 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +trait StoreRegister { + def allStores: Set[StoreConfig[_]] + + lazy val storeMap: Map[AggregateType.Value, StoreConfig[_]] = allStores + .map(store => (store.aggregateType, store)) + .toMap + + lazy val storeNameToTypeMap: Map[String, AggregateType.Value] = allStores + .flatMap(store => store.storeNames.map(name => (name, store.aggregateType))) + .toMap +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/TypedAggregateGroup.scala b/timelines/data_processing/ml_util/aggregation_framework/TypedAggregateGroup.scala new file mode 100644 index 000000000..92afc4137 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/TypedAggregateGroup.scala @@ -0,0 +1,486 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregateFeature +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetric +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon._ +import com.twitter.timelines.data_processing.ml_util.transforms.OneToSomeTransform +import com.twitter.util.Duration +import com.twitter.util.Try +import java.lang.{Boolean => JBoolean} +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import java.util.{Set => JSet} +import scala.annotation.tailrec +import scala.language.existentials +import scala.collection.JavaConverters._ +import scala.util.matching.Regex + +/** + * A case class contained precomputed data useful to quickly + * process operations over an aggregate. + * + * @param query The underlying feature being aggregated + * @param metric The aggregation metric + * @param outputFeatures The output features that aggregation will produce + * @param outputFeatureIds The precomputed hashes of the above outputFeatures + */ +case class PrecomputedAggregateDescriptor[T]( + query: AggregateFeature[T], + metric: AggregationMetric[T, _], + outputFeatures: List[Feature[_]], + outputFeatureIds: List[JLong]) + +object TypedAggregateGroup { + + /** + * Recursive function that generates all combinations of value + * assignments for a collection of sparse binary features. + * + * @param sparseBinaryIdValues list of sparse binary feature ids and possible values they can take + * @return A set of maps, where each map represents one possible assignment of values to ids + */ + def sparseBinaryPermutations( + sparseBinaryIdValues: List[(Long, Set[String])] + ): Set[Map[Long, String]] = sparseBinaryIdValues match { + case (id, values) +: rest => + tailRecSparseBinaryPermutations( + existingPermutations = values.map(value => Map(id -> value)), + remainingIdValues = rest + ) + case Nil => Set.empty + } + + @tailrec private[this] def tailRecSparseBinaryPermutations( + existingPermutations: Set[Map[Long, String]], + remainingIdValues: List[(Long, Set[String])] + ): Set[Map[Long, String]] = remainingIdValues match { + case Nil => existingPermutations + case (id, values) +: rest => + tailRecSparseBinaryPermutations( + existingPermutations.flatMap { existingIdValueMap => + values.map(value => existingIdValueMap ++ Map(id -> value)) + }, + rest + ) + } + + val SparseFeatureSuffix = ".member" + def sparseFeature(sparseBinaryFeature: Feature[_]): Feature[String] = + new Feature.Text( + sparseBinaryFeature.getDenseFeatureName + SparseFeatureSuffix, + AggregationMetricCommon.derivePersonalDataTypes(Some(sparseBinaryFeature))) + + /* Throws exception if obj not an instance of U */ + private[this] def validate[U](obj: Any): U = { + require(obj.isInstanceOf[U]) + obj.asInstanceOf[U] + } + + private[this] def getFeatureOpt[U](dataRecord: DataRecord, feature: Feature[U]): Option[U] = + Option(SRichDataRecord(dataRecord).getFeatureValue(feature)).map(validate[U](_)) + + /** + * Get a mapping from feature ids + * (including individual sparse elements of a sparse feature) to values + * from the given data record, for a given feature type. + * + * @param dataRecord Data record to get features from + * @param keysToAggregate key features to get id-value mappings for + * @param featureType Feature type to get id-value maps for + */ + def getKeyFeatureIdValues[U]( + dataRecord: DataRecord, + keysToAggregate: Set[Feature[_]], + featureType: FeatureType + ): Set[(Long, Option[U])] = { + val featuresOfThisType: Set[Feature[U]] = keysToAggregate + .filter(_.getFeatureType == featureType) + .map(validate[Feature[U]]) + + featuresOfThisType + .map { feature: Feature[U] => + val featureId: Long = getDenseFeatureId(feature) + val featureOpt: Option[U] = getFeatureOpt(dataRecord, feature) + (featureId, featureOpt) + } + } + + // TypedAggregateGroup may transform the aggregate keys for internal use. This method generates + // denseFeatureIds for the transformed feature. + def getDenseFeatureId(feature: Feature[_]): Long = + if (feature.getFeatureType != FeatureType.SPARSE_BINARY) { + feature.getDenseFeatureId + } else { + sparseFeature(feature).getDenseFeatureId + } + + /** + * Return denseFeatureIds for the input features after applying the custom transformation that + * TypedAggregateGroup applies to its keysToAggregate. + * + * @param keysToAggregate key features to get id for + */ + def getKeyFeatureIds(keysToAggregate: Set[Feature[_]]): Set[Long] = + keysToAggregate.map(getDenseFeatureId) + + def checkIfAllKeysExist[U](featureIdValueMap: Map[Long, Option[U]]): Boolean = + featureIdValueMap.forall { case (_, valueOpt) => valueOpt.isDefined } + + def liftOptions[U](featureIdValueMap: Map[Long, Option[U]]): Map[Long, U] = + featureIdValueMap + .flatMap { + case (id, valueOpt) => + valueOpt.map { value => (id, value) } + } + + val timestampFeature: Feature[JLong] = SharedFeatures.TIMESTAMP + + /** + * Builds all valid aggregation keys (for the output store) from + * a datarecord and a spec listing the keys to aggregate. There + * can be multiple aggregation keys generated from a single data + * record when grouping by sparse binary features, for which multiple + * values can be set within the data record. + * + * @param dataRecord Data record to read values for key features from + * @return A set of AggregationKeys encoding the values of all keys + */ + def buildAggregationKeys( + dataRecord: DataRecord, + keysToAggregate: Set[Feature[_]] + ): Set[AggregationKey] = { + val discreteAggregationKeys = getKeyFeatureIdValues[Long]( + dataRecord, + keysToAggregate, + FeatureType.DISCRETE + ).toMap + + val textAggregationKeys = getKeyFeatureIdValues[String]( + dataRecord, + keysToAggregate, + FeatureType.STRING + ).toMap + + val sparseBinaryIdValues = getKeyFeatureIdValues[JSet[String]]( + dataRecord, + keysToAggregate, + FeatureType.SPARSE_BINARY + ).map { + case (id, values) => + ( + id, + values + .map(_.asScala.toSet) + .getOrElse(Set.empty[String]) + ) + }.toList + + if (checkIfAllKeysExist(discreteAggregationKeys) && + checkIfAllKeysExist(textAggregationKeys)) { + if (sparseBinaryIdValues.nonEmpty) { + sparseBinaryPermutations(sparseBinaryIdValues).map { sparseBinaryTextKeys => + AggregationKey( + discreteFeaturesById = liftOptions(discreteAggregationKeys), + textFeaturesById = liftOptions(textAggregationKeys) ++ sparseBinaryTextKeys + ) + } + } else { + Set( + AggregationKey( + discreteFeaturesById = liftOptions(discreteAggregationKeys), + textFeaturesById = liftOptions(textAggregationKeys) + ) + ) + } + } else Set.empty[AggregationKey] + } + +} + +/** + * Specifies one or more related aggregate(s) to compute in the summingbird job. + * + * @param inputSource Source to compute this aggregate over + * @param preTransforms Sequence of [[com.twitter.ml.api.RichITransform]] that transform + * data records pre-aggregation (e.g. discretization, renaming) + * @param samplingTransformOpt Optional [[OneToSomeTransform]] that transform data + * record to optional data record (e.g. for sampling) before aggregation + * @param aggregatePrefix Prefix to use for naming resultant aggregate features + * @param keysToAggregate Features to group by when computing the aggregates + * (e.g. USER_ID, AUTHOR_ID) + * @param featuresToAggregate Features to aggregate (e.g. blender_score or is_photo) + * @param labels Labels to cross the features with to make pair features, if any. + * use Label.All if you don't want to cross with a label. + * @param metrics Aggregation metrics to compute (e.g. count, mean) + * @param halfLives Half lives to use for the aggregations, to be crossed with the above. + * use Duration.Top for "forever" aggregations over an infinite time window (no decay). + * @param outputStore Store to output this aggregate to + * @param includeAnyFeature Aggregate label counts for any feature value + * @param includeAnyLabel Aggregate feature counts for any label value (e.g. all impressions) + * + * The overall config for the summingbird job consists of a list of "AggregateGroup" + * case class objects, which get translated into strongly typed "TypedAggregateGroup" + * case class objects. A single TypedAggregateGroup always groups input data records from + * ''inputSource'' by a single set of aggregation keys (''featuresToAggregate''). + * Within these groups, we perform a comprehensive cross of: + * + * ''featuresToAggregate'' x ''labels'' x ''metrics'' x ''halfLives'' + * + * All the resultant aggregate features are assigned a human-readable feature name + * beginning with ''aggregatePrefix'', and are written to DataRecords that get + * aggregated and written to the store specified by ''outputStore''. + * + * Illustrative example. Suppose we define our spec as follows: + * + * TypedAggregateGroup( + * inputSource = "timelines_recap_daily", + * aggregatePrefix = "user_author_aggregate", + * keysToAggregate = Set(USER_ID, AUTHOR_ID), + * featuresToAggregate = Set(RecapFeatures.TEXT_SCORE, RecapFeatures.BLENDER_SCORE), + * labels = Set(RecapFeatures.IS_FAVORITED, RecapFeatures.IS_REPLIED), + * metrics = Set(CountMetric, MeanMetric), + * halfLives = Set(7.Days, 30.Days), + * outputStore = "user_author_aggregate_store" + * ) + * + * This will process data records from the source named "timelines_recap_daily" + * (see AggregateSource.scala for more details on how to add your own source) + * It will produce a total of 2x2x2x2 = 16 aggregation features, named like: + * + * user_author_aggregate.pair.recap.engagement.is_favorited.recap.searchfeature.blender_score.count.7days + * user_author_aggregate.pair.recap.engagement.is_favorited.recap.searchfeature.blender_score.count.30days + * user_author_aggregate.pair.recap.engagement.is_favorited.recap.searchfeature.blender_score.mean.7days + * + * ... (and so on) + * + * and all the result features will be stored in DataRecords, summed up, and written + * to the output store defined by the name "user_author_aggregate_store". + * (see AggregateStore.scala for details on how to add your own store). + * + * If you do not want a full cross, split up your config into multiple TypedAggregateGroup + * objects. Splitting is strongly advised to avoid blowing up and creating invalid + * or unnecessary combinations of aggregate features (note that some combinations + * are useless or invalid e.g. computing the mean of a binary feature). Splitting + * also does not cost anything in terms of real-time performance, because all + * Aggregate objects in the master spec that share the same ''keysToAggregate'', the + * same ''inputSource'' and the same ''outputStore'' are grouped by the summingbird + * job logic and stored into a single DataRecord in the output store. Overlapping + * aggregates will also automatically be deduplicated so don't worry about overlaps. + */ +case class TypedAggregateGroup[T]( + inputSource: AggregateSource, + aggregatePrefix: String, + keysToAggregate: Set[Feature[_]], + featuresToAggregate: Set[Feature[T]], + labels: Set[_ <: Feature[JBoolean]], + metrics: Set[AggregationMetric[T, _]], + halfLives: Set[Duration], + outputStore: AggregateStore, + preTransforms: Seq[OneToSomeTransform] = Seq.empty, + includeAnyFeature: Boolean = true, + includeAnyLabel: Boolean = true, + aggExclusionRegex: Seq[String] = Seq.empty) { + import TypedAggregateGroup._ + + val compiledRegexes = aggExclusionRegex.map(new Regex(_)) + + // true if should drop, false if should keep + def filterOutAggregateFeature( + feature: PrecomputedAggregateDescriptor[_], + regexes: Seq[Regex] + ): Boolean = { + if (regexes.nonEmpty) + feature.outputFeatures.exists { feature => + regexes.exists { re => re.findFirstMatchIn(feature.getDenseFeatureName).nonEmpty } + } + else false + } + + def buildAggregationKeys( + dataRecord: DataRecord + ): Set[AggregationKey] = { + TypedAggregateGroup.buildAggregationKeys(dataRecord, keysToAggregate) + } + + /** + * This val precomputes descriptors for all individual aggregates in this group + * (of type ''AggregateFeature''). Also precompute hashes of all aggregation + * "output" features generated by these operators for faster + * run-time performance (this turns out to be a primary CPU bottleneck). + * Ex: for the mean operator, "sum" and "count" are output features + */ + val individualAggregateDescriptors: Set[PrecomputedAggregateDescriptor[T]] = { + /* + * By default, in additional to all feature-label crosses, also + * compute in aggregates over each feature and label without crossing + */ + val labelOptions = labels.map(Option(_)) ++ + (if (includeAnyLabel) Set(None) else Set.empty) + val featureOptions = featuresToAggregate.map(Option(_)) ++ + (if (includeAnyFeature) Set(None) else Set.empty) + for { + feature <- featureOptions + label <- labelOptions + metric <- metrics + halfLife <- halfLives + } yield { + val query = AggregateFeature[T](aggregatePrefix, feature, label, halfLife) + + val aggregateOutputFeatures = metric.getOutputFeatures(query) + val aggregateOutputFeatureIds = metric.getOutputFeatureIds(query) + PrecomputedAggregateDescriptor( + query, + metric, + aggregateOutputFeatures, + aggregateOutputFeatureIds + ) + } + }.filterNot(filterOutAggregateFeature(_, compiledRegexes)) + + /* Precomputes a map from all generated aggregate feature ids to their half lives. */ + val continuousFeatureIdsToHalfLives: Map[Long, Duration] = + individualAggregateDescriptors.flatMap { descriptor => + descriptor.outputFeatures + .flatMap { feature => + if (feature.getFeatureType() == FeatureType.CONTINUOUS) { + Try(feature.asInstanceOf[Feature[JDouble]]).toOption + .map(feature => (feature.getFeatureId(), descriptor.query.halfLife)) + } else None + } + }.toMap + + /* + * Sparse binary keys become individual string keys in the output. + * e.g. group by "words.in.tweet", output key: "words.in.tweet.member" + */ + val allOutputKeys: Set[Feature[_]] = keysToAggregate.map { key => + if (key.getFeatureType == FeatureType.SPARSE_BINARY) sparseFeature(key) + else key + } + + val allOutputFeatures: Set[Feature[_]] = individualAggregateDescriptors.flatMap { + case PrecomputedAggregateDescriptor( + query, + metric, + outputFeatures, + outputFeatureIds + ) => + outputFeatures + } + + val aggregateContext: FeatureContext = new FeatureContext(allOutputFeatures.toList.asJava) + + /** + * Adds all aggregates in this group found in the two input data records + * into a result, mutating the result. Uses a while loop for an + * approximately 10% gain in speed over a for comprehension. + * + * WARNING: mutates ''result'' + * + * @param result The output data record to mutate + * @param left The left data record to add + * @param right The right data record to add + */ + def mutatePlus(result: DataRecord, left: DataRecord, right: DataRecord): Unit = { + val featureIterator = individualAggregateDescriptors.iterator + while (featureIterator.hasNext) { + val descriptor = featureIterator.next + descriptor.metric.mutatePlus( + result, + left, + right, + descriptor.query, + Some(descriptor.outputFeatureIds) + ) + } + } + + /** + * Apply preTransforms sequentially. If any transform results in a dropped (None) + * DataRecord, then entire tranform sequence will result in a dropped DataRecord. + * Note that preTransforms are order-dependent. + */ + private[this] def sequentiallyTransform(dataRecord: DataRecord): Option[DataRecord] = { + val recordOpt = Option(new DataRecord(dataRecord)) + preTransforms.foldLeft(recordOpt) { + case (Some(previousRecord), preTransform) => + preTransform(previousRecord) + case _ => Option.empty[DataRecord] + } + } + + /** + * Given a data record, apply transforms and fetch the incremental contributions to + * each configured aggregate from this data record, and store these in an output data record. + * + * @param dataRecord Input data record to aggregate. + * @return A set of tuples (AggregationKey, DataRecord) whose first entry is an + * AggregationKey indicating what keys we're grouping by, and whose second entry + * is an output data record with incremental contributions to the aggregate value(s) + */ + def computeAggregateKVPairs(dataRecord: DataRecord): Set[(AggregationKey, DataRecord)] = { + sequentiallyTransform(dataRecord) + .flatMap { dataRecord => + val aggregationKeys = buildAggregationKeys(dataRecord) + val increment = new DataRecord + + val isNonEmptyIncrement = individualAggregateDescriptors + .map { descriptor => + descriptor.metric.setIncrement( + output = increment, + input = dataRecord, + query = descriptor.query, + timestampFeature = inputSource.timestampFeature, + aggregateOutputs = Some(descriptor.outputFeatureIds) + ) + } + .exists(identity) + + if (isNonEmptyIncrement) { + SRichDataRecord(increment).setFeatureValue( + timestampFeature, + getTimestamp(dataRecord, inputSource.timestampFeature) + ) + Some(aggregationKeys.map(key => (key, increment))) + } else { + None + } + } + .getOrElse(Set.empty[(AggregationKey, DataRecord)]) + } + + def outputFeaturesToRenamedOutputFeatures(prefix: String): Map[Feature[_], Feature[_]] = { + require(prefix.nonEmpty) + + allOutputFeatures.map { feature => + if (feature.isSetFeatureName) { + val renamedFeatureName = prefix + feature.getDenseFeatureName + val personalDataTypes = + if (feature.getPersonalDataTypes.isPresent) feature.getPersonalDataTypes.get() + else null + + val renamedFeature = feature.getFeatureType match { + case FeatureType.BINARY => + new Feature.Binary(renamedFeatureName, personalDataTypes) + case FeatureType.DISCRETE => + new Feature.Discrete(renamedFeatureName, personalDataTypes) + case FeatureType.STRING => + new Feature.Text(renamedFeatureName, personalDataTypes) + case FeatureType.CONTINUOUS => + new Feature.Continuous(renamedFeatureName, personalDataTypes) + case FeatureType.SPARSE_BINARY => + new Feature.SparseBinary(renamedFeatureName, personalDataTypes) + case FeatureType.SPARSE_CONTINUOUS => + new Feature.SparseContinuous(renamedFeatureName, personalDataTypes) + } + feature -> renamedFeature + } else { + feature -> feature + } + }.toMap + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/Utils.scala b/timelines/data_processing/ml_util/aggregation_framework/Utils.scala new file mode 100644 index 000000000..60196fc62 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/Utils.scala @@ -0,0 +1,122 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +import com.twitter.algebird.ScMapMonoid +import com.twitter.algebird.Semigroup +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureType +import com.twitter.ml.api.util.SRichDataRecord +import java.lang.{Long => JLong} +import scala.collection.{Map => ScMap} + +object Utils { + val dataRecordMerger: DataRecordMerger = new DataRecordMerger + def EmptyDataRecord: DataRecord = new DataRecord() + + private val random = scala.util.Random + private val keyedDataRecordMapMonoid = { + val dataRecordMergerSg = new Semigroup[DataRecord] { + override def plus(x: DataRecord, y: DataRecord): DataRecord = { + dataRecordMerger.merge(x, y) + x + } + } + new ScMapMonoid[Long, DataRecord]()(dataRecordMergerSg) + } + + def keyFromLong(record: DataRecord, feature: Feature[JLong]): Long = + SRichDataRecord(record).getFeatureValue(feature).longValue + + def keyFromString(record: DataRecord, feature: Feature[String]): Long = + try { + SRichDataRecord(record).getFeatureValue(feature).toLong + } catch { + case _: NumberFormatException => 0L + } + + def keyFromHash(record: DataRecord, feature: Feature[String]): Long = + SRichDataRecord(record).getFeatureValue(feature).hashCode.toLong + + def extractSecondary[T]( + record: DataRecord, + secondaryKey: Feature[T], + shouldHash: Boolean = false + ): Long = secondaryKey.getFeatureType match { + case FeatureType.STRING => + if (shouldHash) keyFromHash(record, secondaryKey.asInstanceOf[Feature[String]]) + else keyFromString(record, secondaryKey.asInstanceOf[Feature[String]]) + case FeatureType.DISCRETE => keyFromLong(record, secondaryKey.asInstanceOf[Feature[JLong]]) + case f => throw new IllegalArgumentException(s"Feature type $f is not supported.") + } + + def mergeKeyedRecordOpts(args: Option[KeyedRecord]*): Option[KeyedRecord] = { + val keyedRecords = args.flatten + if (keyedRecords.isEmpty) { + None + } else { + val keys = keyedRecords.map(_.aggregateType) + require(keys.toSet.size == 1, "All merged records must have the same aggregate key.") + val mergedRecord = mergeRecords(keyedRecords.map(_.record): _*) + Some(KeyedRecord(keys.head, mergedRecord)) + } + } + + private def mergeRecords(args: DataRecord*): DataRecord = + if (args.isEmpty) EmptyDataRecord + else { + // can just do foldLeft(new DataRecord) for both cases, but try reusing the EmptyDataRecord singleton as much as possible + args.tail.foldLeft(args.head) { (merged, record) => + dataRecordMerger.merge(merged, record) + merged + } + } + + def mergeKeyedRecordMapOpts( + opt1: Option[KeyedRecordMap], + opt2: Option[KeyedRecordMap], + maxSize: Int = Int.MaxValue + ): Option[KeyedRecordMap] = { + if (opt1.isEmpty && opt2.isEmpty) { + None + } else { + val keys = Seq(opt1, opt2).flatten.map(_.aggregateType) + require(keys.toSet.size == 1, "All merged records must have the same aggregate key.") + val mergedRecordMap = mergeMapOpts(opt1.map(_.recordMap), opt2.map(_.recordMap), maxSize) + Some(KeyedRecordMap(keys.head, mergedRecordMap)) + } + } + + private def mergeMapOpts( + opt1: Option[ScMap[Long, DataRecord]], + opt2: Option[ScMap[Long, DataRecord]], + maxSize: Int = Int.MaxValue + ): ScMap[Long, DataRecord] = { + require(maxSize >= 0) + val keySet = opt1.map(_.keySet).getOrElse(Set.empty) ++ opt2.map(_.keySet).getOrElse(Set.empty) + val totalSize = keySet.size + val rate = if (totalSize <= maxSize) 1.0 else maxSize.toDouble / totalSize + val prunedOpt1 = opt1.map(downsample(_, rate)) + val prunedOpt2 = opt2.map(downsample(_, rate)) + Seq(prunedOpt1, prunedOpt2).flatten + .foldLeft(keyedDataRecordMapMonoid.zero)(keyedDataRecordMapMonoid.plus) + } + + def downsample[K, T](m: ScMap[K, T], samplingRate: Double): ScMap[K, T] = { + if (samplingRate >= 1.0) { + m + } else if (samplingRate <= 0) { + Map.empty + } else { + m.filter { + case (key, _) => + // It is important that the same user with the same sampling rate be deterministically + // selected or rejected. Otherwise, mergeMapOpts will choose different keys for the + // two input maps and their union will be larger than the limit we want. + random.setSeed((key.hashCode, samplingRate.hashCode).hashCode) + random.nextDouble < samplingRate + } + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2Adapter.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2Adapter.scala new file mode 100644 index 000000000..f5b7d1814 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2Adapter.scala @@ -0,0 +1,165 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.algebird.DecayedValue +import com.twitter.algebird.DecayedValueMonoid +import com.twitter.algebird.Monoid +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.FDsl._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.summingbird.batch.BatchID +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregateFeature +import com.twitter.util.Duration +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import scala.collection.JavaConverters._ +import scala.collection.mutable +import java.{util => ju} + +object AggregatesV2Adapter { + type AggregatesV2Tuple = (AggregationKey, (BatchID, DataRecord)) + + val Epsilon: Double = 1e-6 + val decayedValueMonoid: Monoid[DecayedValue] = DecayedValueMonoid(Epsilon) + + /* + * Decays the storedValue from timestamp -> sourceVersion + * + * @param storedValue value read from the aggregates v2 output store + * @param timestamp timestamp corresponding to store value + * @param sourceVersion timestamp of version to decay all values to uniformly + * @param halfLife Half life duration to use for applying decay + * + * By applying this function, the feature values for all users are decayed + * to sourceVersion. This is important to ensure that a user whose aggregates + * were updated long in the past does not have an artifically inflated count + * compared to one whose aggregates were updated (and hence decayed) more recently. + */ + def decayValueToSourceVersion( + storedValue: Double, + timestamp: Long, + sourceVersion: Long, + halfLife: Duration + ): Double = + if (timestamp > sourceVersion) { + storedValue + } else { + decayedValueMonoid + .plus( + DecayedValue.build(storedValue, timestamp, halfLife.inMilliseconds), + DecayedValue.build(0, sourceVersion, halfLife.inMilliseconds) + ) + .value + } + + /* + * Decays all the aggregate features occurring in the ''inputRecord'' + * to a given timestamp, and mutates the ''outputRecord'' accordingly. + * Note that inputRecord and outputRecord can be the same if you want + * to mutate the input in place, the function does this correctly. + * + * @param inputRecord Input record to get features from + * @param aggregates Aggregates to decay + * @param decayTo Timestamp to decay to + * @param trimThreshold Drop features below this trim threshold + * @param outputRecord Output record to mutate + * @return the mutated outputRecord + */ + def mutateDecay( + inputRecord: DataRecord, + aggregateFeaturesAndHalfLives: List[(Feature[_], Duration)], + decayTo: Long, + trimThreshold: Double, + outputRecord: DataRecord + ): DataRecord = { + val timestamp = inputRecord.getFeatureValue(SharedFeatures.TIMESTAMP).toLong + + aggregateFeaturesAndHalfLives.foreach { + case (aggregateFeature: Feature[_], halfLife: Duration) => + if (aggregateFeature.getFeatureType() == FeatureType.CONTINUOUS) { + val continuousFeature = aggregateFeature.asInstanceOf[Feature[JDouble]] + if (inputRecord.hasFeature(continuousFeature)) { + val storedValue = inputRecord.getFeatureValue(continuousFeature).toDouble + val decayedValue = decayValueToSourceVersion(storedValue, timestamp, decayTo, halfLife) + if (math.abs(decayedValue) > trimThreshold) { + outputRecord.setFeatureValue(continuousFeature, decayedValue) + } + } + } + } + + /* Update timestamp to version (now that we've decayed all aggregates) */ + outputRecord.setFeatureValue(SharedFeatures.TIMESTAMP, decayTo) + + outputRecord + } +} + +class AggregatesV2Adapter( + aggregates: Set[TypedAggregateGroup[_]], + sourceVersion: Long, + trimThreshold: Double) + extends IRecordOneToManyAdapter[AggregatesV2Adapter.AggregatesV2Tuple] { + + import AggregatesV2Adapter._ + + val keyFeatures: List[Feature[_]] = aggregates.flatMap(_.allOutputKeys).toList + val aggregateFeatures: List[Feature[_]] = aggregates.flatMap(_.allOutputFeatures).toList + val timestampFeatures: List[Feature[JLong]] = List(SharedFeatures.TIMESTAMP) + val allFeatures: List[Feature[_]] = keyFeatures ++ aggregateFeatures ++ timestampFeatures + + val featureContext: FeatureContext = new FeatureContext(allFeatures.asJava) + + override def getFeatureContext: FeatureContext = featureContext + + val aggregateFeaturesAndHalfLives: List[(Feature[_$3], Duration) forSome { type _$3 }] = + aggregateFeatures.map { aggregateFeature: Feature[_] => + val halfLife = AggregateFeature.parseHalfLife(aggregateFeature) + (aggregateFeature, halfLife) + } + + override def adaptToDataRecords(tuple: AggregatesV2Tuple): ju.List[DataRecord] = tuple match { + case (key: AggregationKey, (batchId: BatchID, record: DataRecord)) => { + val resultRecord = new SRichDataRecord(new DataRecord, featureContext) + + val itr = resultRecord.continuousFeaturesIterator() + val featuresToClear = mutable.Set[Feature[JDouble]]() + while (itr.moveNext()) { + val nextFeature = itr.getFeature + if (!aggregateFeatures.contains(nextFeature)) { + featuresToClear += nextFeature + } + } + + featuresToClear.foreach(resultRecord.clearFeature) + + keyFeatures.foreach { keyFeature: Feature[_] => + if (keyFeature.getFeatureType == FeatureType.DISCRETE) { + resultRecord.setFeatureValue( + keyFeature.asInstanceOf[Feature[JLong]], + key.discreteFeaturesById(keyFeature.getDenseFeatureId) + ) + } else if (keyFeature.getFeatureType == FeatureType.STRING) { + resultRecord.setFeatureValue( + keyFeature.asInstanceOf[Feature[String]], + key.textFeaturesById(keyFeature.getDenseFeatureId) + ) + } + } + + if (record.hasFeature(SharedFeatures.TIMESTAMP)) { + mutateDecay( + record, + aggregateFeaturesAndHalfLives, + sourceVersion, + trimThreshold, + resultRecord) + List(resultRecord.getRecord).asJava + } else { + List.empty[DataRecord].asJava + } + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2FeatureSource.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2FeatureSource.scala new file mode 100644 index 000000000..5e196a43e --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/AggregatesV2FeatureSource.scala @@ -0,0 +1,171 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.bijection.Injection +import com.twitter.bijection.thrift.CompactThriftCodec +import com.twitter.ml.api.AdaptedFeatureSource +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.IRecordOneToManyAdapter +import com.twitter.ml.api.TypedFeatureSource +import com.twitter.scalding.DateRange +import com.twitter.scalding.RichDate +import com.twitter.scalding.TypedPipe +import com.twitter.scalding.commons.source.VersionedKeyValSource +import com.twitter.scalding.commons.tap.VersionedTap.TapMode +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird_internal.bijection.BatchPairImplicits +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKeyInjection +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import org.apache.hadoop.mapred.JobConf +import scala.collection.JavaConverters._ +import AggregatesV2Adapter._ + +object AggregatesV2AdaptedSource { + val DefaultTrimThreshold = 0 +} + +trait AggregatesV2AdaptedSource extends AggregatesV2AdaptedSourceBase[DataRecord] { + override def storageFormatCodec: Injection[DataRecord, Array[Byte]] = + CompactThriftCodec[DataRecord] + override def toDataRecord(v: DataRecord): DataRecord = v +} + +trait AggregatesV2AdaptedSourceBase[StorageFormat] + extends TypedFeatureSource[AggregatesV2Tuple] + with AdaptedFeatureSource[AggregatesV2Tuple] + with BatchPairImplicits { + + /* Output root path of aggregates v2 job, excluding store name and version */ + def rootPath: String + + /* Name of store under root path to read */ + def storeName: String + + // max bijection failures + def maxFailures: Int = 0 + + /* Aggregate config used to generate above output */ + def aggregates: Set[TypedAggregateGroup[_]] + + /* trimThreshold Trim all aggregates below a certain threshold to save memory */ + def trimThreshold: Double + + def toDataRecord(v: StorageFormat): DataRecord + + def sourceVersionOpt: Option[Long] + + def enableMostRecentBeforeSourceVersion: Boolean = false + + implicit private val aggregationKeyInjection: Injection[AggregationKey, Array[Byte]] = + AggregationKeyInjection + implicit def storageFormatCodec: Injection[StorageFormat, Array[Byte]] + + private def filteredAggregates = aggregates.filter(_.outputStore.name == storeName) + def storePath: String = List(rootPath, storeName).mkString("/") + + def mostRecentVkvs: VersionedKeyValSource[_, _] = { + VersionedKeyValSource[AggregationKey, (BatchID, StorageFormat)]( + path = storePath, + sourceVersion = None, + maxFailures = maxFailures + ) + } + + private def availableVersions: Seq[Long] = + mostRecentVkvs + .getTap(TapMode.SOURCE) + .getStore(new JobConf(true)) + .getAllVersions() + .asScala + .map(_.toLong) + + private def mostRecentVersion: Long = { + require(!availableVersions.isEmpty, s"$storeName has no available versions") + availableVersions.max + } + + def versionToUse: Long = + if (enableMostRecentBeforeSourceVersion) { + sourceVersionOpt + .map(sourceVersion => + availableVersions.filter(_ <= sourceVersion) match { + case Seq() => + throw new IllegalArgumentException( + "No version older than version: %s, available versions: %s" + .format(sourceVersion, availableVersions) + ) + case versionList => versionList.max + }) + .getOrElse(mostRecentVersion) + } else { + sourceVersionOpt.getOrElse(mostRecentVersion) + } + + override lazy val adapter: IRecordOneToManyAdapter[AggregatesV2Tuple] = + new AggregatesV2Adapter(filteredAggregates, versionToUse, trimThreshold) + + override def getData: TypedPipe[AggregatesV2Tuple] = { + val vkvsToUse: VersionedKeyValSource[AggregationKey, (BatchID, StorageFormat)] = { + VersionedKeyValSource[AggregationKey, (BatchID, StorageFormat)]( + path = storePath, + sourceVersion = Some(versionToUse), + maxFailures = maxFailures + ) + } + TypedPipe.from(vkvsToUse).map { + case (key, (batch, value)) => (key, (batch, toDataRecord(value))) + } + } +} + +/* + * Adapted data record feature source from aggregates v2 manhattan output + * Params documented in parent trait. + */ +case class AggregatesV2FeatureSource( + override val rootPath: String, + override val storeName: String, + override val aggregates: Set[TypedAggregateGroup[_]], + override val trimThreshold: Double = 0, + override val maxFailures: Int = 0, +)( + implicit val dateRange: DateRange) + extends AggregatesV2AdaptedSource { + + // Increment end date by 1 millisec since summingbird output for date D is stored at (D+1)T00 + override val sourceVersionOpt: Some[Long] = Some(dateRange.end.timestamp + 1) +} + +/* + * Reads most recent available AggregatesV2FeatureSource. + * There is no constraint on recency. + * Params documented in parent trait. + */ +case class AggregatesV2MostRecentFeatureSource( + override val rootPath: String, + override val storeName: String, + override val aggregates: Set[TypedAggregateGroup[_]], + override val trimThreshold: Double = AggregatesV2AdaptedSource.DefaultTrimThreshold, + override val maxFailures: Int = 0) + extends AggregatesV2AdaptedSource { + + override val sourceVersionOpt: None.type = None +} + +/* + * Reads most recent available AggregatesV2FeatureSource + * on or before the specified beforeDate. + * Params documented in parent trait. + */ +case class AggregatesV2MostRecentFeatureSourceBeforeDate( + override val rootPath: String, + override val storeName: String, + override val aggregates: Set[TypedAggregateGroup[_]], + override val trimThreshold: Double = AggregatesV2AdaptedSource.DefaultTrimThreshold, + beforeDate: RichDate, + override val maxFailures: Int = 0) + extends AggregatesV2AdaptedSource { + + override val enableMostRecentBeforeSourceVersion = true + override val sourceVersionOpt: Some[Long] = Some(beforeDate.timestamp + 1) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/BUILD b/timelines/data_processing/ml_util/aggregation_framework/conversion/BUILD new file mode 100644 index 000000000..d6c86cc12 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/BUILD @@ -0,0 +1,71 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/algebird:core", + "3rdparty/jvm/com/twitter/algebird:util", + "3rdparty/jvm/com/twitter/bijection:core", + "3rdparty/jvm/com/twitter/bijection:json", + "3rdparty/jvm/com/twitter/bijection:netty", + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/com/twitter/bijection:thrift", + "3rdparty/jvm/com/twitter/bijection:util", + "3rdparty/jvm/com/twitter/storehaus:algebra", + "3rdparty/jvm/com/twitter/storehaus:core", + "3rdparty/src/jvm/com/twitter/scalding:commons", + "3rdparty/src/jvm/com/twitter/scalding:core", + "3rdparty/src/jvm/com/twitter/scalding:date", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/ml/api:api-base", + "src/scala/com/twitter/ml/api/util", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "src/thrift/com/twitter/summingbird", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "util/util-core:scala", + ], +) + +scala_library( + name = "for-timelines", + sources = [ + "CombineCountsPolicy.scala", + "SparseBinaryMergePolicy.scala", + ], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/algebird:core", + "3rdparty/jvm/com/twitter/algebird:util", + "3rdparty/jvm/com/twitter/bijection:core", + "3rdparty/jvm/com/twitter/bijection:json", + "3rdparty/jvm/com/twitter/bijection:netty", + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/com/twitter/bijection:thrift", + "3rdparty/jvm/com/twitter/bijection:util", + "3rdparty/jvm/com/twitter/storehaus:algebra", + "3rdparty/jvm/com/twitter/storehaus:core", + "3rdparty/src/jvm/com/twitter/scalding:commons", + "3rdparty/src/jvm/com/twitter/scalding:core", + "3rdparty/src/jvm/com/twitter/scalding:date", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "src/thrift/com/twitter/summingbird", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + "util/util-core:scala", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/CombineCountsPolicy.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/CombineCountsPolicy.scala new file mode 100644 index 000000000..eb1690231 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/CombineCountsPolicy.scala @@ -0,0 +1,223 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.google.common.annotations.VisibleForTesting +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.TypedCountMetric +import java.lang.{Double => JDouble} +import scala.collection.JavaConverters._ + +case class CombinedFeatures( + sum: Feature[JDouble], + nonzero: Feature[JDouble], + mean: Feature[JDouble], + topK: Seq[Feature[JDouble]]) + +trait CombineCountsBase { + val SparseSum = "sparse_sum" + val SparseNonzero = "sparse_nonzero" + val SparseMean = "sparse_mean" + val SparseTop = "sparse_top" + + def topK: Int + def hardLimit: Option[Int] + def precomputedCountFeatures: Seq[Feature[_]] + + lazy val precomputedFeaturesMap: Map[Feature[_], CombinedFeatures] = + precomputedCountFeatures.map { countFeature => + val derivedPersonalDataTypes = + AggregationMetricCommon.derivePersonalDataTypes(Some(countFeature)) + val sum = new Feature.Continuous( + countFeature.getDenseFeatureName + "." + SparseSum, + derivedPersonalDataTypes) + val nonzero = new Feature.Continuous( + countFeature.getDenseFeatureName + "." + SparseNonzero, + derivedPersonalDataTypes) + val mean = new Feature.Continuous( + countFeature.getDenseFeatureName + "." + SparseMean, + derivedPersonalDataTypes) + val topKFeatures = (1 to topK).map { k => + new Feature.Continuous( + countFeature.getDenseFeatureName + "." + SparseTop + k, + derivedPersonalDataTypes) + } + (countFeature, CombinedFeatures(sum, nonzero, mean, topKFeatures)) + }.toMap + + lazy val outputFeaturesPostMerge: Set[Feature[JDouble]] = + precomputedFeaturesMap.values.flatMap { combinedFeatures: CombinedFeatures => + Seq( + combinedFeatures.sum, + combinedFeatures.nonzero, + combinedFeatures.mean + ) ++ combinedFeatures.topK + }.toSet + + private case class ComputedStats(sum: Double, nonzero: Double, mean: Double) + + private def preComputeStats(featureValues: Seq[Double]): ComputedStats = { + val (sum, nonzero) = featureValues.foldLeft((0.0, 0.0)) { + case ((accSum, accNonzero), value) => + (accSum + value, if (value > 0.0) accNonzero + 1.0 else accNonzero) + } + ComputedStats(sum, nonzero, if (nonzero > 0.0) sum / nonzero else 0.0) + } + + private def computeSortedFeatureValues(featureValues: List[Double]): List[Double] = + featureValues.sortBy(-_) + + private def extractKth(sortedFeatureValues: Seq[Double], k: Int): Double = + sortedFeatureValues + .lift(k - 1) + .getOrElse(0.0) + + private def setContinuousFeatureIfNonZero( + record: SRichDataRecord, + feature: Feature[JDouble], + value: Double + ): Unit = + if (value != 0.0) { + record.setFeatureValue(feature, value) + } + + def hydrateCountFeatures( + richRecord: SRichDataRecord, + features: Seq[Feature[_]], + featureValuesMap: Map[Feature[_], List[Double]] + ): Unit = + for { + feature <- features + featureValues <- featureValuesMap.get(feature) + } { + mergeRecordFromCountFeature( + countFeature = feature, + featureValues = featureValues, + richInputRecord = richRecord + ) + } + + def mergeRecordFromCountFeature( + richInputRecord: SRichDataRecord, + countFeature: Feature[_], + featureValues: List[Double] + ): Unit = { + // In majority of calls to this method from timeline scorer + // the featureValues list is empty. + // While with empty list each operation will be not that expensive, these + // small things do add up. By adding early stop here we can avoid sorting + // empty list, allocating several options and making multiple function + // calls. In addition to that, we won't iterate over [1, topK]. + if (featureValues.nonEmpty) { + val sortedFeatureValues = hardLimit + .map { limit => + computeSortedFeatureValues(featureValues).take(limit) + }.getOrElse(computeSortedFeatureValues(featureValues)).toIndexedSeq + val computed = preComputeStats(sortedFeatureValues) + + val combinedFeatures = precomputedFeaturesMap(countFeature) + setContinuousFeatureIfNonZero( + richInputRecord, + combinedFeatures.sum, + computed.sum + ) + setContinuousFeatureIfNonZero( + richInputRecord, + combinedFeatures.nonzero, + computed.nonzero + ) + setContinuousFeatureIfNonZero( + richInputRecord, + combinedFeatures.mean, + computed.mean + ) + (1 to topK).foreach { k => + setContinuousFeatureIfNonZero( + richInputRecord, + combinedFeatures.topK(k - 1), + extractKth(sortedFeatureValues, k) + ) + } + } + } +} + +object CombineCountsPolicy { + def getCountFeatures(aggregateContext: FeatureContext): Seq[Feature[_]] = + aggregateContext.getAllFeatures.asScala.toSeq + .filter { feature => + feature.getFeatureType == FeatureType.CONTINUOUS && + feature.getDenseFeatureName.endsWith(TypedCountMetric[JDouble]().operatorName) + } + + @VisibleForTesting + private[conversion] def getFeatureValues( + dataRecordsWithCounts: List[DataRecord], + countFeature: Feature[_] + ): List[Double] = + dataRecordsWithCounts.map(new SRichDataRecord(_)).flatMap { record => + Option(record.getFeatureValue(countFeature)).map(_.asInstanceOf[JDouble].toDouble) + } +} + +/** + * A merge policy that works whenever all aggregate features are + * counts (computed using CountMetric), and typically represent + * either impressions or engagements. For each such input count + * feature, the policy outputs the following (3+k) derived features + * into the output data record: + * + * Sum of the feature's value across all aggregate records + * Number of aggregate records that have the feature set to non-zero + * Mean of the feature's value across all aggregate records + * topK values of the feature across all aggregate records + * + * @param topK topK values to compute + * @param hardLimit when set, records are sorted and only the top values will be used for aggregation if + * the number of records are higher than this hard limit. + */ +case class CombineCountsPolicy( + override val topK: Int, + aggregateContextToPrecompute: FeatureContext, + override val hardLimit: Option[Int] = None) + extends SparseBinaryMergePolicy + with CombineCountsBase { + import CombineCountsPolicy._ + override val precomputedCountFeatures: Seq[Feature[_]] = getCountFeatures( + aggregateContextToPrecompute) + + override def mergeRecord( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord], + aggregateContext: FeatureContext + ): Unit = { + // Assumes aggregateContext === aggregateContextToPrecompute + mergeRecordFromCountFeatures(mutableInputRecord, aggregateRecords, precomputedCountFeatures) + } + + def defaultMergeRecord( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord] + ): Unit = { + mergeRecordFromCountFeatures(mutableInputRecord, aggregateRecords, precomputedCountFeatures) + } + + def mergeRecordFromCountFeatures( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord], + countFeatures: Seq[Feature[_]] + ): Unit = { + val richInputRecord = new SRichDataRecord(mutableInputRecord) + countFeatures.foreach { countFeature => + mergeRecordFromCountFeature( + richInputRecord = richInputRecord, + countFeature = countFeature, + featureValues = getFeatureValues(aggregateRecords, countFeature) + ) + } + } + + override def aggregateFeaturesPostMerge(aggregateContext: FeatureContext): Set[Feature[_]] = + outputFeaturesPostMerge.map(_.asInstanceOf[Feature[_]]) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/DataSetPipeSketchJoin.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/DataSetPipeSketchJoin.scala new file mode 100644 index 000000000..8d3dd58bb --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/DataSetPipeSketchJoin.scala @@ -0,0 +1,46 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.bijection.Injection +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding.TypedPipe + +object DataSetPipeSketchJoin { + val DefaultSketchNumReducers = 500 + val dataRecordMerger: DataRecordMerger = new DataRecordMerger + implicit val str2Byte: String => Array[Byte] = + implicitly[Injection[String, Array[Byte]]].toFunction + + /* Computes a left sketch join on a set of skewed keys. */ + def apply( + inputDataSet: DataSetPipe, + skewedJoinKeys: Product, + joinFeaturesDataSet: DataSetPipe, + sketchNumReducers: Int = DefaultSketchNumReducers + ): DataSetPipe = { + val joinKeyList = skewedJoinKeys.productIterator.toList.asInstanceOf[List[Feature[_]]] + + def makeKey(record: DataRecord): String = + joinKeyList + .map(SRichDataRecord(record).getFeatureValue(_)) + .toString + + def byKey(pipe: DataSetPipe): TypedPipe[(String, DataRecord)] = + pipe.records.map(record => (makeKey(record), record)) + + val joinedRecords = byKey(inputDataSet) + .sketch(sketchNumReducers) + .leftJoin(byKey(joinFeaturesDataSet)) + .values + .map { + case (inputRecord, joinFeaturesOpt) => + joinFeaturesOpt.foreach { joinRecord => dataRecordMerger.merge(inputRecord, joinRecord) } + inputRecord + } + + DataSetPipe( + joinedRecords, + FeatureContext.merge(inputDataSet.featureContext, joinFeaturesDataSet.featureContext) + ) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/PickFirstRecordPolicy.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/PickFirstRecordPolicy.scala new file mode 100644 index 000000000..b022d35b0 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/PickFirstRecordPolicy.scala @@ -0,0 +1,26 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.ml.api._ +import com.twitter.ml.api.FeatureContext +import scala.collection.JavaConverters._ + +/* + * A really bad default merge policy that picks all the aggregate + * features corresponding to the first sparse key value in the list. + * Does not rename any of the aggregate features for simplicity. + * Avoid using this merge policy if at all possible. + */ +object PickFirstRecordPolicy extends SparseBinaryMergePolicy { + val dataRecordMerger: DataRecordMerger = new DataRecordMerger + + override def mergeRecord( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord], + aggregateContext: FeatureContext + ): Unit = + aggregateRecords.headOption + .foreach(aggregateRecord => dataRecordMerger.merge(mutableInputRecord, aggregateRecord)) + + override def aggregateFeaturesPostMerge(aggregateContext: FeatureContext): Set[Feature[_]] = + aggregateContext.getAllFeatures.asScala.toSet +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/PickTopCtrPolicy.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/PickTopCtrPolicy.scala new file mode 100644 index 000000000..94d3ac126 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/PickTopCtrPolicy.scala @@ -0,0 +1,226 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.ml.api._ +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon +import java.lang.{Boolean => JBoolean} +import java.lang.{Double => JDouble} + +case class CtrDescriptor( + engagementFeature: Feature[JDouble], + impressionFeature: Feature[JDouble], + outputFeature: Feature[JDouble]) + +object PickTopCtrBuilderHelper { + + def createCtrDescriptors( + aggregatePrefix: String, + engagementLabels: Set[Feature[JBoolean]], + aggregatesToCompute: Set[TypedAggregateGroup[_]], + outputSuffix: String + ): Set[CtrDescriptor] = { + val aggregateFeatures = aggregatesToCompute + .filter(_.aggregatePrefix == aggregatePrefix) + + val impressionFeature = aggregateFeatures + .flatMap { group => + group.individualAggregateDescriptors + .filter(_.query.feature == None) + .filter(_.query.label == None) + .flatMap(_.outputFeatures) + } + .head + .asInstanceOf[Feature[JDouble]] + + val aggregateEngagementFeatures = + aggregateFeatures + .flatMap { group => + group.individualAggregateDescriptors + .filter(_.query.feature == None) + .filter { descriptor => + //TODO: we should remove the need to pass around engagementLabels and just use all the labels available. + descriptor.query.label.exists(engagementLabels.contains(_)) + } + .flatMap(_.outputFeatures) + } + .map(_.asInstanceOf[Feature[JDouble]]) + + aggregateEngagementFeatures + .map { aggregateEngagementFeature => + CtrDescriptor( + engagementFeature = aggregateEngagementFeature, + impressionFeature = impressionFeature, + outputFeature = new Feature.Continuous( + aggregateEngagementFeature.getDenseFeatureName + "." + outputSuffix, + AggregationMetricCommon.derivePersonalDataTypes( + Some(aggregateEngagementFeature), + Some(impressionFeature) + ) + ) + ) + } + } +} + +object PickTopCtrPolicy { + def build( + aggregatePrefix: String, + engagementLabels: Set[Feature[JBoolean]], + aggregatesToCompute: Set[TypedAggregateGroup[_]], + smoothing: Double = 1.0, + outputSuffix: String = "ratio" + ): PickTopCtrPolicy = { + val ctrDescriptors = PickTopCtrBuilderHelper.createCtrDescriptors( + aggregatePrefix = aggregatePrefix, + engagementLabels = engagementLabels, + aggregatesToCompute = aggregatesToCompute, + outputSuffix = outputSuffix + ) + PickTopCtrPolicy( + ctrDescriptors = ctrDescriptors, + smoothing = smoothing + ) + } +} + +object CombinedTopNCtrsByWilsonConfidenceIntervalPolicy { + def build( + aggregatePrefix: String, + engagementLabels: Set[Feature[JBoolean]], + aggregatesToCompute: Set[TypedAggregateGroup[_]], + outputSuffix: String = "ratioWithWCI", + z: Double = 1.96, + topN: Int = 1 + ): CombinedTopNCtrsByWilsonConfidenceIntervalPolicy = { + val ctrDescriptors = PickTopCtrBuilderHelper.createCtrDescriptors( + aggregatePrefix = aggregatePrefix, + engagementLabels = engagementLabels, + aggregatesToCompute = aggregatesToCompute, + outputSuffix = outputSuffix + ) + CombinedTopNCtrsByWilsonConfidenceIntervalPolicy( + ctrDescriptors = ctrDescriptors, + z = z, + topN = topN + ) + } +} + +/* + * A merge policy that picks the aggregate features corresponding to + * the sparse key value with the highest engagement rate (defined + * as the ratio of two specified features, representing engagements + * and impressions). Also outputs the engagement rate to the specified + * outputFeature. + * + * This is an abstract class. We can make variants of this policy by overriding + * the calculateCtr method. + */ + +abstract class PickTopCtrPolicyBase(ctrDescriptors: Set[CtrDescriptor]) + extends SparseBinaryMergePolicy { + + private def getContinuousFeature( + aggregateRecord: DataRecord, + feature: Feature[JDouble] + ): Double = { + Option(SRichDataRecord(aggregateRecord).getFeatureValue(feature)) + .map(_.asInstanceOf[JDouble].toDouble) + .getOrElse(0.0) + } + + /** + * For every provided descriptor, compute the corresponding CTR feature + * and only hydrate this result to the provided input record. + */ + override def mergeRecord( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord], + aggregateContext: FeatureContext + ): Unit = { + ctrDescriptors + .foreach { + case CtrDescriptor(engagementFeature, impressionFeature, outputFeature) => + val sortedCtrs = + aggregateRecords + .map { aggregateRecord => + val impressions = getContinuousFeature(aggregateRecord, impressionFeature) + val engagements = getContinuousFeature(aggregateRecord, engagementFeature) + calculateCtr(impressions, engagements) + } + .sortBy { ctr => -ctr } + combineTopNCtrsToSingleScore(sortedCtrs) + .foreach { score => + SRichDataRecord(mutableInputRecord).setFeatureValue(outputFeature, score) + } + } + } + + protected def calculateCtr(impressions: Double, engagements: Double): Double + + protected def combineTopNCtrsToSingleScore(sortedCtrs: Seq[Double]): Option[Double] + + override def aggregateFeaturesPostMerge(aggregateContext: FeatureContext): Set[Feature[_]] = + ctrDescriptors + .map(_.outputFeature) + .toSet +} + +case class PickTopCtrPolicy(ctrDescriptors: Set[CtrDescriptor], smoothing: Double = 1.0) + extends PickTopCtrPolicyBase(ctrDescriptors) { + require(smoothing > 0.0) + + override def calculateCtr(impressions: Double, engagements: Double): Double = + (1.0 * engagements) / (smoothing + impressions) + + override def combineTopNCtrsToSingleScore(sortedCtrs: Seq[Double]): Option[Double] = + sortedCtrs.headOption +} + +case class CombinedTopNCtrsByWilsonConfidenceIntervalPolicy( + ctrDescriptors: Set[CtrDescriptor], + z: Double = 1.96, + topN: Int = 1) + extends PickTopCtrPolicyBase(ctrDescriptors) { + + private val zSquared = z * z + private val zSquaredDiv2 = zSquared / 2.0 + private val zSquaredDiv4 = zSquared / 4.0 + + /** + * calculates the lower bound of wilson score interval. which roughly says "the actual engagement + * rate is at least this value" with confidence designated by the z-score: + * https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval + */ + override def calculateCtr(rawImpressions: Double, engagements: Double): Double = { + // just in case engagements happens to be more than impressions... + val impressions = Math.max(rawImpressions, engagements) + + if (impressions > 0.0) { + val p = engagements / impressions + (p + + zSquaredDiv2 / impressions + - z * Math.sqrt( + (p * (1.0 - p) + zSquaredDiv4 / impressions) / impressions)) / (1.0 + zSquared / impressions) + + } else 0.0 + } + + /** + * takes the topN engagement rates, and returns the joint probability as {1.0 - Π(1.0 - p)} + * + * e.g. let's say you have 0.6 chance of clicking on a tweet shared by the user A. + * you also have 0.3 chance of clicking on a tweet shared by the user B. + * seeing a tweet shared by both A and B will not lead to 0.9 chance of you clicking on it. + * but you could say that you have 0.4*0.7 chance of NOT clicking on that tweet. + */ + override def combineTopNCtrsToSingleScore(sortedCtrs: Seq[Double]): Option[Double] = + if (sortedCtrs.nonEmpty) { + val inverseLogP = sortedCtrs + .take(topN).map { p => Math.log(1.0 - p) }.sum + Some(1.0 - Math.exp(inverseLogP)) + } else None + +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryAggregateJoin.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryAggregateJoin.scala new file mode 100644 index 000000000..10c6a9096 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryAggregateJoin.scala @@ -0,0 +1,199 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.ml.api._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding.typed.TypedPipe +import com.twitter.scalding.typed.UnsortedGrouped +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import java.util.{Set => JSet} +import scala.collection.JavaConverters._ + +object SparseBinaryAggregateJoin { + import TypedAggregateGroup._ + + def makeKey(record: DataRecord, joinKeyList: List[Feature[_]]): String = { + joinKeyList.map { + case sparseKey: Feature.SparseBinary => + SRichDataRecord(record).getFeatureValue(sparseFeature(sparseKey)) + case nonSparseKey: Feature[_] => + SRichDataRecord(record).getFeatureValue(nonSparseKey) + }.toString + } + + /** + * @param record Data record to get all possible sparse aggregate keys from + * @param List of join key features (some can be sparse and some non-sparse) + * @return A list of string keys to use for joining + */ + def makeKeyPermutations(record: DataRecord, joinKeyList: List[Feature[_]]): List[String] = { + val allIdValues = joinKeyList.flatMap { + case sparseKey: Feature.SparseBinary => { + val id = sparseKey.getDenseFeatureId + val valuesOpt = Option(SRichDataRecord(record).getFeatureValue(sparseKey)) + .map(_.asInstanceOf[JSet[String]].asScala.toSet) + valuesOpt.map { (id, _) } + } + case nonSparseKey: Feature[_] => { + val id = nonSparseKey.getDenseFeatureId + Option(SRichDataRecord(record).getFeatureValue(nonSparseKey)).map { value => + (id, Set(value.toString)) + } + } + } + sparseBinaryPermutations(allIdValues).toList.map { idValues => + joinKeyList.map { key => idValues.getOrElse(key.getDenseFeatureId, "") }.toString + } + } + + private[this] def mkKeyIndexedAggregates( + joinFeaturesDataSet: DataSetPipe, + joinKeyList: List[Feature[_]] + ): TypedPipe[(String, DataRecord)] = + joinFeaturesDataSet.records + .map { record => (makeKey(record, joinKeyList), record) } + + private[this] def mkKeyIndexedInput( + inputDataSet: DataSetPipe, + joinKeyList: List[Feature[_]] + ): TypedPipe[(String, DataRecord)] = + inputDataSet.records + .flatMap { record => + for { + key <- makeKeyPermutations(record, joinKeyList) + } yield { (key, record) } + } + + private[this] def mkKeyIndexedInputWithUniqueId( + inputDataSet: DataSetPipe, + joinKeyList: List[Feature[_]], + uniqueIdFeatureList: List[Feature[_]] + ): TypedPipe[(String, String)] = + inputDataSet.records + .flatMap { record => + for { + key <- makeKeyPermutations(record, joinKeyList) + } yield { (key, makeKey(record, uniqueIdFeatureList)) } + } + + private[this] def mkRecordIndexedAggregates( + keyIndexedInput: TypedPipe[(String, DataRecord)], + keyIndexedAggregates: TypedPipe[(String, DataRecord)] + ): UnsortedGrouped[DataRecord, List[DataRecord]] = + keyIndexedInput + .join(keyIndexedAggregates) + .map { case (_, (inputRecord, aggregateRecord)) => (inputRecord, aggregateRecord) } + .group + .toList + + private[this] def mkRecordIndexedAggregatesWithUniqueId( + keyIndexedInput: TypedPipe[(String, String)], + keyIndexedAggregates: TypedPipe[(String, DataRecord)] + ): UnsortedGrouped[String, List[DataRecord]] = + keyIndexedInput + .join(keyIndexedAggregates) + .map { case (_, (inputId, aggregateRecord)) => (inputId, aggregateRecord) } + .group + .toList + + def mkJoinedDataSet( + inputDataSet: DataSetPipe, + joinFeaturesDataSet: DataSetPipe, + recordIndexedAggregates: UnsortedGrouped[DataRecord, List[DataRecord]], + mergePolicy: SparseBinaryMergePolicy + ): TypedPipe[DataRecord] = + inputDataSet.records + .map(record => (record, ())) + .leftJoin(recordIndexedAggregates) + .map { + case (inputRecord, (_, aggregateRecordsOpt)) => + aggregateRecordsOpt + .map { aggregateRecords => + mergePolicy.mergeRecord( + inputRecord, + aggregateRecords, + joinFeaturesDataSet.featureContext + ) + inputRecord + } + .getOrElse(inputRecord) + } + + def mkJoinedDataSetWithUniqueId( + inputDataSet: DataSetPipe, + joinFeaturesDataSet: DataSetPipe, + recordIndexedAggregates: UnsortedGrouped[String, List[DataRecord]], + mergePolicy: SparseBinaryMergePolicy, + uniqueIdFeatureList: List[Feature[_]] + ): TypedPipe[DataRecord] = + inputDataSet.records + .map(record => (makeKey(record, uniqueIdFeatureList), record)) + .leftJoin(recordIndexedAggregates) + .map { + case (_, (inputRecord, aggregateRecordsOpt)) => + aggregateRecordsOpt + .map { aggregateRecords => + mergePolicy.mergeRecord( + inputRecord, + aggregateRecords, + joinFeaturesDataSet.featureContext + ) + inputRecord + } + .getOrElse(inputRecord) + } + + /** + * If uniqueIdFeatures is non-empty and the join keys include a sparse binary + * key, the join will use this set of keys as a unique id to reduce + * memory consumption. You should need this option only for + * memory-intensive joins to avoid OOM errors. + */ + def apply( + inputDataSet: DataSetPipe, + joinKeys: Product, + joinFeaturesDataSet: DataSetPipe, + mergePolicy: SparseBinaryMergePolicy = PickFirstRecordPolicy, + uniqueIdFeaturesOpt: Option[Product] = None + ): DataSetPipe = { + val joinKeyList = joinKeys.productIterator.toList.asInstanceOf[List[Feature[_]]] + val sparseBinaryJoinKeySet = + joinKeyList.toSet.filter(_.getFeatureType() == FeatureType.SPARSE_BINARY) + val containsSparseBinaryKey = !sparseBinaryJoinKeySet.isEmpty + if (containsSparseBinaryKey) { + val uniqueIdFeatureList = uniqueIdFeaturesOpt + .map(uniqueIdFeatures => + uniqueIdFeatures.productIterator.toList.asInstanceOf[List[Feature[_]]]) + .getOrElse(List.empty[Feature[_]]) + val keyIndexedAggregates = mkKeyIndexedAggregates(joinFeaturesDataSet, joinKeyList) + val joinedDataSet = if (uniqueIdFeatureList.isEmpty) { + val keyIndexedInput = mkKeyIndexedInput(inputDataSet, joinKeyList) + val recordIndexedAggregates = + mkRecordIndexedAggregates(keyIndexedInput, keyIndexedAggregates) + mkJoinedDataSet(inputDataSet, joinFeaturesDataSet, recordIndexedAggregates, mergePolicy) + } else { + val keyIndexedInput = + mkKeyIndexedInputWithUniqueId(inputDataSet, joinKeyList, uniqueIdFeatureList) + val recordIndexedAggregates = + mkRecordIndexedAggregatesWithUniqueId(keyIndexedInput, keyIndexedAggregates) + mkJoinedDataSetWithUniqueId( + inputDataSet, + joinFeaturesDataSet, + recordIndexedAggregates, + mergePolicy, + uniqueIdFeatureList + ) + } + + DataSetPipe( + joinedDataSet, + mergePolicy.mergeContext( + inputDataSet.featureContext, + joinFeaturesDataSet.featureContext + ) + ) + } else { + inputDataSet.joinWithSmaller(joinKeys, joinFeaturesDataSet) { _.pass } + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMergePolicy.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMergePolicy.scala new file mode 100644 index 000000000..7201e39a2 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMergePolicy.scala @@ -0,0 +1,81 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.ml.api._ +import com.twitter.ml.api.FeatureContext +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import scala.collection.JavaConverters._ + +/** + * When using the aggregates framework to group by sparse binary keys, + * we generate different aggregate feature values for each possible + * value of the sparse key. Hence, when joining back the aggregate + * features with a training data set, each individual training record + * has multiple aggregate features to choose from, for each value taken + * by the sparse key(s) in the training record. The merge policy trait + * below specifies how to condense/combine this variable number of + * aggregate features into a constant number of features for training. + * Some simple policies might be: pick the first feature set (randomly), + * pick the top sorted by some attribute, or take some average. + * + * Example: suppose we group by (ADVERTISER_ID, INTEREST_ID) where INTEREST_ID + * is the sparse key, and compute a "CTR" aggregate feature for each such + * pair measuring the click through rate on ads with (ADVERTISER_ID, INTEREST_ID). + * Say we have the following aggregate records: + * + * (ADVERTISER_ID = 1, INTEREST_ID = 1, CTR = 5%) + * (ADVERTISER_ID = 1, INTEREST_ID = 2, CTR = 15%) + * (ADVERTISER_ID = 2, INTEREST_ID = 1, CTR = 1%) + * (ADVERTISER_ID = 2, INTEREST_ID = 2, CTR = 10%) + * ... + * At training time, each training record has one value for ADVERTISER_ID, but it + * has multiple values for INTEREST_ID e.g. + * + * (ADVERTISER_ID = 1, INTEREST_IDS = (1,2)) + * + * There are multiple potential CTRs we can get when joining in the aggregate features: + * in this case 2 values (5% and 15%) but in general it could be many depending on how + * many interests the user has. When joining back the CTR features, the merge policy says how to + * combine all these CTRs to engineer features. + * + * "Pick first" would say - pick some random CTR (whatever is first in the list, maybe 5%) + * for training (probably not a good policy). "Sort by CTR" could be a policy + * that just picks the top CTR and uses it as a feature (here 15%). Similarly, you could + * imagine "Top K sorted by CTR" (use both 5 and 15%) or "Avg CTR" (10%) or other policies, + * all of which are defined as objects/case classes that override this trait. + */ +trait SparseBinaryMergePolicy { + + /** + * @param mutableInputRecord Input record to add aggregates to + * @param aggregateRecords Aggregate feature records + * @param aggregateContext Context for aggregate records + */ + def mergeRecord( + mutableInputRecord: DataRecord, + aggregateRecords: List[DataRecord], + aggregateContext: FeatureContext + ): Unit + + def aggregateFeaturesPostMerge(aggregateContext: FeatureContext): Set[Feature[_]] + + /** + * @param inputContext Context for input record + * @param aggregateContext Context for aggregate records + * @return Context for record returned by mergeRecord() + */ + def mergeContext( + inputContext: FeatureContext, + aggregateContext: FeatureContext + ): FeatureContext = new FeatureContext( + (inputContext.getAllFeatures.asScala.toSet ++ aggregateFeaturesPostMerge( + aggregateContext)).toSeq.asJava + ) + + def allOutputFeaturesPostMergePolicy[T](config: TypedAggregateGroup[T]): Set[Feature[_]] = { + val containsSparseBinary = config.keysToAggregate + .exists(_.getFeatureType == FeatureType.SPARSE_BINARY) + + if (!containsSparseBinary) config.allOutputFeatures + else aggregateFeaturesPostMerge(new FeatureContext(config.allOutputFeatures.toSeq.asJava)) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMultipleAggregateJoin.scala b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMultipleAggregateJoin.scala new file mode 100644 index 000000000..d0aff7e34 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/conversion/SparseBinaryMultipleAggregateJoin.scala @@ -0,0 +1,109 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion + +import com.twitter.bijection.Injection +import com.twitter.ml.api._ +import com.twitter.ml.api.Feature +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding.typed.TypedPipe +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup.sparseFeature +import scala.collection.JavaConverters._ + +case class SparseJoinConfig( + aggregates: DataSetPipe, + sparseKey: Feature.SparseBinary, + mergePolicies: SparseBinaryMergePolicy*) + +object SparseBinaryMultipleAggregateJoin { + type CommonMap = (String, ((Feature.SparseBinary, String), DataRecord)) + + def apply( + source: DataSetPipe, + commonKey: Feature[_], + joinConfigs: Set[SparseJoinConfig], + rightJoin: Boolean = false, + isSketchJoin: Boolean = false, + numSketchJoinReducers: Int = 0 + ): DataSetPipe = { + val emptyPipe: TypedPipe[CommonMap] = TypedPipe.empty + val aggregateMaps: Set[TypedPipe[CommonMap]] = joinConfigs.map { joinConfig => + joinConfig.aggregates.records.map { record => + val sparseKeyValue = + SRichDataRecord(record).getFeatureValue(sparseFeature(joinConfig.sparseKey)).toString + val commonKeyValue = SRichDataRecord(record).getFeatureValue(commonKey).toString + (commonKeyValue, ((joinConfig.sparseKey, sparseKeyValue), record)) + } + } + + val commonKeyToAggregateMap = aggregateMaps + .foldLeft(emptyPipe) { + case (union: TypedPipe[CommonMap], next: TypedPipe[CommonMap]) => + union ++ next + } + .group + .toList + .map { + case (commonKeyValue, aggregateTuples) => + (commonKeyValue, aggregateTuples.toMap) + } + + val commonKeyToRecordMap = source.records + .map { record => + val commonKeyValue = SRichDataRecord(record).getFeatureValue(commonKey).toString + (commonKeyValue, record) + } + + // rightJoin is not supported by Sketched, so rightJoin will be ignored if isSketchJoin is set + implicit val string2Byte = (value: String) => Injection[String, Array[Byte]](value) + val intermediateRecords = if (isSketchJoin) { + commonKeyToRecordMap.group + .sketch(numSketchJoinReducers) + .leftJoin(commonKeyToAggregateMap) + .toTypedPipe + } else if (rightJoin) { + commonKeyToAggregateMap + .rightJoin(commonKeyToRecordMap) + .mapValues(_.swap) + .toTypedPipe + } else { + commonKeyToRecordMap.leftJoin(commonKeyToAggregateMap).toTypedPipe + } + + val joinedRecords = intermediateRecords + .map { + case (commonKeyValue, (inputRecord, aggregateTupleMapOpt)) => + aggregateTupleMapOpt.foreach { aggregateTupleMap => + joinConfigs.foreach { joinConfig => + val sparseKeyValues = Option( + SRichDataRecord(inputRecord) + .getFeatureValue(joinConfig.sparseKey) + ).map(_.asScala.toList) + .getOrElse(List.empty[String]) + + val aggregateRecords = sparseKeyValues.flatMap { sparseKeyValue => + aggregateTupleMap.get((joinConfig.sparseKey, sparseKeyValue)) + } + + joinConfig.mergePolicies.foreach { mergePolicy => + mergePolicy.mergeRecord( + inputRecord, + aggregateRecords, + joinConfig.aggregates.featureContext + ) + } + } + } + inputRecord + } + + val joinedFeatureContext = joinConfigs + .foldLeft(source.featureContext) { + case (left, joinConfig) => + joinConfig.mergePolicies.foldLeft(left) { + case (soFar, mergePolicy) => + mergePolicy.mergeContext(soFar, joinConfig.aggregates.featureContext) + } + } + + DataSetPipe(joinedRecords, joinedFeatureContext) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/AUTOMATED_COMMIT_FILES b/timelines/data_processing/ml_util/aggregation_framework/docs/AUTOMATED_COMMIT_FILES new file mode 100644 index 000000000..80aaae8d9 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/AUTOMATED_COMMIT_FILES @@ -0,0 +1,5 @@ +aggregation.rst +batch.rst +index.rst +real-time.rst +troubleshooting.rst diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/aggregation.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/aggregation.rst new file mode 100644 index 000000000..fddd926b4 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/aggregation.rst @@ -0,0 +1,167 @@ +.. _aggregation: + +Core Concepts +============= + +This page provides an overview of the aggregation framework and goes through examples on how to define aggregate features. In general, we can think of an aggregate feature as a grouped set of records, on which we incrementally update the aggregate feature values, crossed by the provided features and conditional on the provided labels. + +AggregateGroup +-------------- + +An `AggregateGroup` defines a single unit of aggregate computation, similar to a SQL query. These are executed by the underlying jobs (internally, a `DataRecordAggregationMonoid `_ is applied to `DataRecords` that contain the features to aggregate). Many of these groups can exist to define different types of aggregate features. + +Let's start with the following examples of an `AggregateGroup` to discuss the meaning of each of its constructor arguments: + +.. code-block:: scala + + val UserAggregateStore = "user_aggregates" + val aggregatesToCompute: Set[TypedAggregateGroup[_]] = Set( + AggregateGroup( + inputSource = timelinesDailyRecapSource, + aggregatePrefix = "user_aggregate_v2", + preTransformOpt = Some(RemoveUserIdZero), + keys = Set(USER_ID), + features = Set(HAS_PHOTO), + labels = Set(IS_FAVORITED), + metrics = Set(CountMetric, SumMetric), + halfLives = Set(50.days), + outputStore = OfflineAggregateStore( + name = UserAggregateStore, + startDate = "2016-07-15 00:00", + commonConfig = timelinesDailyAggregateSink, + batchesToKeep = 5 + ) + ) + .flatMap(_.buildTypedAggregateGroups) + ) + +This `AggregateGroup` computes the number of times each user has faved a tweet with a photo. The aggregate count is decayed with a 50 day halflife. + +Naming and preprocessing +------------------------ + +`UserAggregateStore` is a string val that acts as a scope of a "root path" to which this group of aggregate features will be written. The root path is provided separately by the implementing job. + +`inputSource` defines the input source of `DataRecords` that we aggregate on. These records contain the relevant features required for aggregation. + +`aggregatePrefix` tells the framework what prefix to use for the aggregate features it generates. A descriptive naming scheme with versioning makes it easier to maintain features as you add or remove them over the long-term. + +`preTransforms` is a `Seq[com.twitter.ml.api.ITransform] `_ that can be applied to the data records read from the input source before they are fed into the `AggregateGroup` to apply aggregation. These transforms are optional but can be useful for certain preprocessing operations for a group's raw input features. + +.. admonition:: Examples + + You can downsample input data records by providing `preTransforms`. In addition, you could also join different input labels (e.g. "is_push_openend" and "is_push_favorited") and transform them into a combined label that is their union ("is_push_engaged") on which aggregate counts will be calculated. + + +Keys +---- + +`keys` is a crucial field in the config. It defines a `Set[com.twitter.ml.api.Feature]` which specifies a set of grouping keys to use for this `AggregateGroup`. + +Keys can only be of 3 supported types currently: `DISCRETE`, `STRING` and `SPARSE_BINARY`. Using a discrete or a string/text feature as a key specifies the unit to group records by before applying counting/aggregation operators. + + +.. admonition:: Examples + + .. cssclass:: shortlist + + #. If the key is `USER_ID`, this tells the framework to group all records by `USER_ID`, and then apply aggregations (sum/count/etc) within each user’s data to generate aggregate features for each user. + + #. If the key is `(USER_ID, AUTHOR_ID)`, then the `AggregateGroup` will output features for each unique user-author pair in the input data. + + #. Finally, using a sparse binary feature as key has special "flattening" or "flatMap" like semantics. For example, consider grouping by `(USER_ID, AUTHOR_INTEREST_IDS)` where `AUTHOR_INTEREST_IDS` is a sparse binary feature which represents a set of topic IDs the author may be tweeting about. This creates one record for each `(user_id, interest_id)` pair - so each record with multiple author interests is flattened before feeding it to the aggregation. + +Features +-------- + +`features` specifies a `Set[com.twitter.ml.api.Feature]` to aggregate within each group (defined by the keys specified earlier). + +We support 2 types of `features`: `BINARY` and `CONTINUOUS`. + +The semantics of how the aggregation works is slightly different based on the type of “feature”, and based on the “metric” (or aggregation operation): + +.. cssclass:: shortlist + +#. Binary Feature, Count Metric: Suppose we have a binary feature `HAS_PHOTO` in this set, and are applying the “Count” metric (see below for more details on the metrics), with key `USER_ID`. The semantics is that this computes a feature which measures the count of records with `HAS_PHOTO` set to true for each user. + +#. Binary Feature, Sum Metric - Does not apply. No feature will be computed. + +#. Continuous Feature, Count Metric - The count metric treats all features as binary features ignoring their value. For example, suppose we have a continuous feature `NUM_CHARACTERS_IN_TWEET`, and key `USER_ID`. This measures the count of records that have this feature `NUM_CHARACTERS_IN_TWEET` present. + +#. Continuous Feature, Sum Metric - In the above example, the features measures the sum of (num_characters_in_tweet) over all a user’s records. Dividing this sum feature by the count feature would give the average number of characters in all tweets. + +.. admonition:: Unsupported feature types + + `DISCRETE` and `SPARSE` features are not supported by the Sum Metric, because there is no meaning in summing a discrete feature or a sparse feature. You can use them with the CountMetric, but they may not do what you would expect since they will be treated as binary features losing all the information within the feature. The best way to use these is as “keys” and not as “features”. + +.. admonition:: Setting includeAnyFeature + + If constructor argument `includeAnyFeature` is set, the framework will append a feature with scope `any_feature` to the set of all features you define. This additional feature simply measures the total count of records. So if you set your features to be equal to Set.empty, this will measure the count of records for a given `USER_ID`. + +Labels +------ + +`labels` specifies a set of `BINARY` features that you can cross with, prior to applying aggregations on the `features`. This essentially restricts the aggregate computation to a subset of the records within a particular key. + +We typically use this to represent engagement labels in an ML model, in this case, `IS_FAVORITED`. + +In this example, we are grouping by `USER_ID`, the feature is `HAS_PHOTO`, the label is `IS_FAVORITED`, and we are computing `CountMetric`. The system will output a feature for each user that represents the number of favorites on tweets having photos by this `userId`. + +.. admonition:: Setting includeAnyLabel + + If constructor argument `includeAnyLabel` is set (as it is by default), then similar to `any_feature`, the framework automatically appends a label of type `any_label` to the set of all labels you define, which represents not applying any filter or cross. + +In this example, `any_label` and `any_feature` are set by default and the system would actually output 4 features for each `user_id`: + +.. cssclass:: shortlist + +#. The number of `IS_FAVORITED` (favorites) on tweet impressions having `HAS_PHOTO=true` + +#. The number of `IS_FAVORITED` (favorites) on all tweet impressions (`any_feature` aggregate) + +#. The number of tweet impressions having `HAS_PHOTO=true` (`any_label` aggregate) + +#. The total number of tweet impressions for this user id (`any_feature.any_label` aggregate) + +.. admonition:: Disabling includeAnyLabel + + To disable this automatically generated feature you can use `includeAnyLabel = false` in your config. This will remove some useful features (particularly for counterfactual signal), but it can greatly save on space since it does not store every possible impressed set of keys in the output store. So use this if you are short on space, but not otherwise. + +Metrics +------- + +`metrics` specifies the aggregate operators to apply. The most commonly used are `Count`, `Sum` and `SumSq`. + +As mentioned before, `Count` can be applied to all types of features, but treats every feature as binary and ignores the value of the feature. `Sum` and `SumSq` can only be applied to Continuous features - they will ignore all other features you specify. By combining sum and sumsq and count, you can produce powerful “z-score” features or other distributional features using a post-transform. + +It is also possible to add your own aggregate operators (e.g. `LastResetMetric `_) to the framework with some additional work. + +HalfLives +--------- + +`halfLives` specifies how fast aggregate features should be decayed. It is important to note that the framework works on an incremental basis: in the batch implementation, the summingbird-scalding job takes in the most recently computed aggregate features, processed on data until day `N-1`, then reads new data records for day `N` and computes updated values of the aggregate features. Similarly, the decay of real-time aggregate features takes the actual time delta between the current time and the last time the aggregate feature value was updated. + +The halflife `H` specifies how fast to decay old sums/counts to simulate a sliding window of counts. The implementation is such that it will take `H` amount of time to decay an aggregate feature to half its initial value. New observed values of sums/counts are added to the aggregate feature value. + +.. admonition:: Batch and real-time + + In the batch use case where aggregate features are recomputed on a daily basis, we typically take halflives on the order of weeks or longer (in Timelines, 50 days). In the real-time use case, shorter halflives are appropriate (hours) since they are updated as client engagements are received by the summingbird job. + + +SQL Equivalent +-------------- +Conceptually, you can also think of it as: + +.. code-block:: sql + + INSERT INTO . + SELECT AGG() /* AGG is , which is a exponentially decaying SUM or COUNT etc. based on the halfLifves */ + FROM ( + SELECT preTransformOpt(*) FROM + ) + GROUP BY + WHERE = True + +any_features is AGG(*). + +any_labels removes the WHERE clause. \ No newline at end of file diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/batch.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/batch.rst new file mode 100644 index 000000000..f3b6ac9a5 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/batch.rst @@ -0,0 +1,215 @@ +.. _batch: + +Batch aggregate feature jobs +============================ + +In the previous section, we went over the core concepts of the aggregation framework and discussed how you can set up you own `AggregateGroups` to compute aggregate features. + +Given these groups, this section will discuss how you can setup offline batch jobs to produce the corresponding aggregate features, updated daily. To accomplish this, we need to setup a summingbird-scalding job that is pointed to the input data records containing features and labels to be aggregated. + +Input Data +---------- + +In order to generate aggregate features, the relevant input features need to be available offline as a daily scalding source in `DataRecord` format (typically `DailySuffixFeatureSource `_, though `HourlySuffixFeatureSource` could also be usable but we have not tested this). + +.. admonition:: Note + + The input data source should contain the keys, features and labels you want to use in your `AggregateGroups`. + +Aggregation Config +------------------ + +Now that we have a daily data source with input features and labels, we need to setup the `AggregateGroup` config itself. This contains all aggregation groups that you would like to compute and we will go through the implementation step-by-step. + +.. admonition:: Example: Timelines Quality config + + `TimelinesAggregationConfig `_ imports the configured `AggregationGroups` from `TimelinesAggregationConfigDetails `_. The config is then referenced by the implementing summingbird-scalding job which we will setup below. + +OfflineAggregateSource +---------------------- + +Each `AggregateGroup` will need to define a (daily) source of input features. We use `OfflineAggregateSource` for this to tell the aggregation framework where the input data set is and the required timestamp feature that the framework uses to decay aggregate feature values: + +.. code-block:: scala + + val timelinesDailyRecapSource = OfflineAggregateSource( + name = "timelines_daily_recap", + timestampFeature = TIMESTAMP, + scaldingHdfsPath = Some("/user/timelines/processed/suggests/recap/data_records"), + scaldingSuffixType = Some("daily"), + withValidation = true + ) + +.. admonition:: Note + + .. cssclass:: shortlist + + #. The name is not important as long as it is unique. + + #. `timestampFeature` must be a discrete feature of type `com.twitter.ml.api.Feature[Long]` and represents the “time” of a given training record in milliseconds - for example, the time at which an engagement, push open event, or abuse event took place that you are trying to train on. If you do not already have such a feature in your daily training data, you need to add one. + + #. `scaldingSuffixType` can be “hourly” or “daily” depending on the type of source (`HourlySuffixFeatureSource` vs `DailySuffixFeatureSource`). + + #. Set `withValidation` to true to validate the presence of _SUCCESS file. Context: https://jira.twitter.biz/browse/TQ-10618 + +Output HDFS store +----------------- + +The output HDFS store is where the computed aggregate features are stored. This store contains all computed aggregate feature values and is incrementally updated by the aggregates job every day. + +.. code-block:: scala + + val outputHdfsPath = "/user/timelines/processed/aggregates_v2" + val timelinesOfflineAggregateSink = new OfflineStoreCommonConfig { + override def apply(startDate: String) = new OfflineAggregateStoreCommonConfig( + outputHdfsPathPrefix = outputHdfsPath, + dummyAppId = "timelines_aggregates_v2_ro", // unused - can be arbitrary + dummyDatasetPrefix = "timelines_aggregates_v2_ro", // unused - can be arbitrary + startDate = startDate + ) + } + +Note: `dummyAppId` and `dummyDatasetPrefix` are unused so can be set to any arbitrary value. They should be removed on the framework side. + +The `outputHdfsPathPrefix` is the only field that matters, and should be set to the HDFS path where you want to store the aggregate features. Make sure you have a lot of quota available at that path. + +Setting Up Aggregates Job +------------------------- + +Once you have defined a config file with the aggregates you would like to compute, the next step is to create the aggregates scalding job using the config (`example `_). This is very concise and requires only a few lines of code: + +.. code-block:: scala + + object TimelinesAggregationScaldingJob extends AggregatesV2ScaldingJob { + override val aggregatesToCompute = TimelinesAggregationConfig.aggregatesToCompute + } + +Now that the scalding job is implemented with the aggregation config, we need to setup a capesos config similar to https://cgit.twitter.biz/source/tree/science/scalding/mesos/timelines/prod.yml: + +.. code-block:: scala + + # Common configuration shared by all aggregates v2 jobs + __aggregates_v2_common__: &__aggregates_v2_common__ + class: HadoopSummingbirdProducer + bundle: offline_aggregation-deploy.tar.gz + mainjar: offline_aggregation-deploy.jar + pants_target: "bundle timelines/data_processing/ad_hoc/aggregate_interactions/v2/offline_aggregation:bin" + cron_collision_policy: CANCEL_NEW + use_libjar_wild_card: true + +.. code-block:: scala + + # Specific job computing user aggregates + user_aggregates_v2: + <<: *__aggregates_v2_common__ + cron_schedule: "25 * * * *" + arguments: --batches 1 --output_stores user_aggregates --job_name timelines_user_aggregates_v2 + +.. admonition:: Important + + Each AggregateGroup in your config should have its own associated offline job which specifies `output_stores` pointing to the output store name you defined in your config. + +Running The Job +--------------- + +When you run the batch job for the first time, you need to add a temporary entry to your capesos yml file that looks like this: + +.. code-block:: scala + + user_aggregates_v2_initial_run: + <<: *__aggregates_v2_common__ + cron_schedule: "25 * * * *" + arguments: --batches 1 --start-time “2017-03-03 00:00:00” --output_stores user_aggregates --job_name timelines_user_aggregates_v2 + +.. admonition:: Start Time + + The additional `--start-time` argument should match the `startDate` in your config for that AggregateGroup, but in the format `yyyy-mm-dd hh:mm:ss`. + +To invoke the initial run via capesos, we would do the following (in Timelines case): + +.. code-block:: scala + + CAPESOSPY_ENV=prod capesospy-v2 update --build_locally --start_cron user_aggregates_v2_initial_run science/scalding/mesos/timelines/prod.yml + +Once it is running smoothly, you can deschedule the initial run job and delete the temporary entry from your production yml config. + +.. code-block:: scala + + aurora cron deschedule atla/timelines/prod/user_aggregates_v2_initial_run + +Note: deschedule it preemptively to avoid repeatedly overwriting the same initial results + +Then schedule the production job from jenkins using something like this: + +.. code-block:: scala + + CAPESOSPY_ENV=prod capesospy-v2 update user_aggregates_v2 science/scalding/mesos/timelines/prod.yml + +All future runs (2nd onwards) will use the permanent entry in the capesos yml config that does not have the `start-time` specified. + +.. admonition:: Job name has to match + + It's important that the production run should share the same `--job_name` with the initial_run so that eagleeye/statebird knows how to keep track of it correctly. + +Output Aggregate Features +------------------------- + +This scalding job using the example config from the earlier section would output a VersionedKeyValSource to `/user/timelines/processed/aggregates_v2/user_aggregates` on HDFS. + +Note that `/user/timelines/processed/aggregates_v2` is the explicitly defined root path while `user_aggregates` is the output directory of the example `AggregateGroup` defined earlier. The latter can be different for different `AggregateGroups` defined in your config. + + +The VersionedKeyValSource is difficult to use directly in your jobs/offline trainings, but we provide an adapted source `AggregatesV2FeatureSource` that makes it easy to join and use in your jobs: + +.. code-block:: scala + + import com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion._ + + val pipe: DataSetPipe = AggregatesV2FeatureSource( + rootPath = "/user/timelines/processed/aggregates_v2", + storeName = "user_aggregates", + aggregates = TimelinesAggregationConfig.aggregatesToCompute, + trimThreshold = 0 + )(dateRange).read + +Simply replace the `rootPath`, `storeName` and `aggregates` object to whatever you defined. The `trimThreshold` tells the framework to trim all features below a certain cutoff: 0 is a safe default to use to begin with. + +.. admonition:: Usage + + This can now be used like any other `DataSetPipe` in offline ML jobs. You can write out the features to a `DailySuffixFeatureSource`, you can join them with your data offline for trainings, or you can write them to a Manhattan store for serving online. + +Aggregate Features Example +-------------------------- + +Here is an example of sample of the aggregate features we just computed: + +.. code-block:: scala + + user_aggregate_v2.pair.any_label.any_feature.50.days.count: 100.0 + user_aggregate_v2.pair.any_label.tweetsource.is_quote.50.days.count: 30.0 + user_aggregate_v2.pair.is_favorited.any_feature.50.days.count: 10.0 + user_aggregate_v2.pair.is_favorited.tweetsource.is_quote.50.days.count: 6.0 + meta.user_id: 123456789 + +Aggregate feature names match a `prefix.pair.label.feature.half_life.metric` schema and correspond to what was defined in the aggregation config for each of these fields. + +.. admonition:: Example + + In this example, the above features are capturing that userId 123456789L has: + + .. + A 50-day decayed count of 100 training records with any label or feature (“tweet impressions”) + + A 50-day decayed count of 30 records that are “quote tweets” (tweetsource.is_quote = true) + + A 50-day decayed count of 10 records that are favorites on any type of tweet (is_favorited = true) + + A 50-day decayed count of 6 records that are “favorites” on “quote tweets” (both of the above are true) + +By combining the above, a model might infer that for this specific user, quote tweets comprise 30% of all impressions, have a favorite rate of 6/30 = 20%, compared to a favorite rate of 10/100 = 10% on the total population of tweets. + +Therefore, being a quote tweet makes this specific user `123456789L` approximately twice as likely to favorite the tweet, which is useful for prediction and could result in the ML model giving higher scores to & ranking quote tweets higher in a personalized fashion for this user. + +Tests for Feature Names +-------------------------- +When you change or add AggregateGroup, feature names might change. And the Feature Store provides a testing mechanism to assert that the feature names change as you expect. See `tests for feature names `_. diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/conf.py b/timelines/data_processing/ml_util/aggregation_framework/docs/conf.py new file mode 100644 index 000000000..03996dfd7 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/conf.py @@ -0,0 +1,59 @@ +# -*- coding: utf-8 -*- +# +# docbird documentation build configuration file +# Note that not all possible configuration values are present in this +# autogenerated file. +# + +from os.path import abspath, dirname, isfile, join + + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.intersphinx", + "sphinx.ext.ifconfig", + "sphinx.ext.graphviz", + "twitter.docbird.ext.thriftlexer", + "twitter.docbird.ext.toctree_default_caption", + "sphinxcontrib.httpdomain", +] + + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# The suffix of source filenames. +source_suffix = ".rst" + +# The master toctree document. +master_doc = "index" + +# General information about the project. +project = u"""Aggregation Framework""" +description = u"""""" + +# The short X.Y version. +version = u"""1.0""" +# The full version, including alpha/beta/rc tags. +release = u"""1.0""" + +exclude_patterns = ["_build"] + +pygments_style = "sphinx" + +html_theme = "default" + +html_static_path = ["_static"] + +html_logo = u"""""" + +# Automagically add project logo, if it exists +# (checks on any build, not just init) +# Scan for some common defaults (png or svg format, +# called "logo" or project name, in docs folder) +if not html_logo: + location = dirname(abspath(__file__)) + for logo_file in ["logo.png", "logo.svg", ("%s.png" % project), ("%s.svg" % project)]: + html_logo = logo_file if isfile(join(location, logo_file)) else html_logo + +graphviz_output_format = "svg" diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/index.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/index.rst new file mode 100644 index 000000000..af703c688 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/index.rst @@ -0,0 +1,11 @@ +.. markdowninclude:: ../README.md + +.. toctree:: + :maxdepth: 2 + :hidden: + + aggregation + batch + real-time + joining + troubleshooting diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/joining.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/joining.rst new file mode 100644 index 000000000..2ecdf7612 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/joining.rst @@ -0,0 +1,72 @@ +.. _joining: + +Joining aggregates features to records +====================================== + +After setting up either offline batch jobs or online real-time summingbird jobs to produce +aggregate features and querying them, we are left with data records containing aggregate features. +This page will go over how to join them with other data records to produce offline training data. + +(To discuss: joining aggregates to records online) + +Joining Aggregates on Discrete/String Keys +------------------------------------------ + +Joining aggregate features keyed on discrete or text features to your training data is very easy - +you can use the built in methods provided by `DataSetPipe`. For example, suppose you have aggregates +keyed by `(USER_ID, AUTHOR_ID)`: + +.. code-block:: scala + + val userAuthorAggregates: DataSetPipe = AggregatesV2FeatureSource( + rootPath = “/path/to/my/aggregates”, + storeName = “user_author_aggregates”, + aggregates = MyConfig.aggregatesToCompute, + trimThreshold = 0 + )(dateRange).read + +Offline, you can then join with your training data set as follows: + +.. code-block:: scala + + val myTrainingData: DataSetPipe = ... + val joinedData = myTrainingData.joinWithLarger((USER_ID, AUTHOR_ID), userAuthorAggregates) + +You can read from `AggregatesV2MostRecentFeatureSourceBeforeDate` in order to read the most recent aggregates +before a provided date `beforeDate`. Just note that `beforeDate` must be aligned with the date boundary so if +you’re passing in a `dateRange`, use `dateRange.end`). + +Joining Aggregates on Sparse Binary Keys +---------------------------------------- + +When joining on sparse binary keys, there can be multiple aggregate records to join to each training record in +your training data set. For example, suppose you have setup an aggregate group that is keyed on `(INTEREST_ID, AUTHOR_ID)` +capturing engagement counts of users interested in a particular `INTEREST_ID` for specific authors provided by `AUTHOR_ID`. + +Suppose now that you have a training data record representing a specific user action. This training data record contains +a sparse binary feature `INTEREST_IDS` representing all the "interests" of that user - e.g. music, sports, and so on. Each `interest_id` +translates to a different set of counting features found in your aggregates data. Therefore we need a way to merge all of +these different sets of counting features to produce a more compact, fixed-size set of features. + +.. admonition:: Merge policies + + To do this, the aggregate framework provides a trait `SparseBinaryMergePolicy `_. Classes overriding this trait define policies + that state how to merge the individual aggregate features from each sparse binary value (in this case, each `INTEREST_ID` for a user). + Furthermore, we provide `SparseBinaryMultipleAggregateJoin` which executes these policies to merge aggregates. + +A simple policy might simply average all the counts from the individual interests, or just take the max, or +a specific quantile. More advanced policies might use custom criteria to decide which interest is most relevant and choose +features from that interest to represent the user, or use some weighted combination of counts. + +The framework provides two simple in-built policies (`PickTopCtrPolicy `_ +and `CombineCountsPolicy `_, which keeps the topK counts per +record) that you can get started with, though you likely want to implement your own policy based on domain knowledge to get +the best results for your specific problem domain. + +.. admonition:: Offline Code Example + + The scalding job `TrainingDataWithAggV2Generator `_ shows how multiple merge policies are defined and implemented to merge aggregates on sparse binary keys to the TQ's training data records. + +.. admonition:: Online Code Example + + In our (non-FeatureStore enabled) online code path, we merge aggregates on sparse binary keys using the `CombineCountsPolicy `_. diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst new file mode 100644 index 000000000..fc853ba69 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst @@ -0,0 +1,327 @@ +.. _real_time: + +Real-Time aggregate features +============================ + +In addition to computing batch aggregate features, the aggregation framework supports real-time aggregates as well. The framework concepts used here are identical to the batch use case, however, the underlying implementation differs and is provided by summingbird-storm jobs. + +RTA Runbook +----------- + +For operational details, please visit http://go/tqrealtimeaggregates. + +Prerequisites +------------- + +In order to start computing real-time aggregate features, the framework requires the following to be provided: + +* A backing memcached store that will hold the computed aggregate features. This is conceptually equivalent to the output HDFS store in the batch compute case. +* Implementation of `StormAggregateSource `_ that creates `DataRecords` with the necessary input features. This serves as the input to the aggregation operations. +* Definition of aggregate features by defining `AggregateGroup` in an implementation of `OnlineAggregationConfigTrait`. This is identical to the batch case. +* Job config file defining the backing memcached for feature storage and retrieval, and job-related parameters. + +We will now go through the details in setting up each required component. + +Memcached store +--------------- + +Real-time aggregates use Memcache as the backing cache to store and update aggregate features keys. Caches can be provisioned on `go/cacheboard `_. + +.. admonition:: Test and prod caches + + For development, it is sufficient to setup a test cache that your new job can query and write to. At the same time, a production cache request should also be submitted as these generally have significant lead times for provisioning. + +StormAggregateSource +-------------------- + +To enable aggregation of your features, we need to start with defining a `StormAggregateSource` that builds a `Producer[Storm, DataRecord]`. This summingbird producer generates `DataRecords` that contain the input features and labels that the real-time aggregate job will compute aggregate features on. Conceptually, this is equivalent to the input data set in the offline batch use case. + +.. admonition:: Example + + If you are planning to aggregate on client engagements, you would need to subscribe to the `ClientEvent` kafka stream and then convert each event to a `DataRecord` that contains the key and the engagement on which to aggregate. + +Typically, we would setup a julep filter for the relevant client events that we would like to aggregate on. This gives us a `Producer[Storm, LogEvent]` object which we then convert to `Producer[Storm, DataRecord]` with adapters that we wrote: + +.. code-block:: scala + + lazy val clientEventProducer: Producer[Storm, LogEvent] = + ClientEventSourceScrooge( + appId = AppId(jobConfig.appId), + topic = "julep_client_event_suggests", + resumeAtLastReadOffset = false + ).source.name("timelines_events") + + lazy val clientEventWithCachedFeaturesProducer: Producer[Storm, DataRecord] = clientEventProducer + .flatMap(mkDataRecords) + +Note that this way of composing the storm graph gives us flexiblity in how we can hydrate input features. If you would like to join more complex features to `DataRecord`, you can do so here with additional storm components which can implement cache queries. + +.. admonition:: Timelines Quality use case + + In Timelines Quality, we aggregate client engagements on `userId` or `tweetId` and implement + `TimelinesStormAggregateSource `_. We create + `Producer[Storm,LogEvent]` of Timelines engagements to which we apply `ClientLogEventAdapter `_ which converts the event to `DataRecord` containing `userId`, `tweetId`, `timestampFeature` of the engagement and the engagement label itself. + +.. admonition:: MagicRecs use case + + MagicRecs has a very similar setup for real-time aggregate features. In addition, they also implement a more complex cache query to fetch the user's history in the `StormAggregateSource` for each observed client engagement to hydrate a richer set of input `DataRecords`: + + .. code-block:: scala + + val userHistoryStoreService: Storm#Service[Long, History] = + Storm.service(UserHistoryReadableStore) + + val clientEventDataRecordProducer: Producer[Storm, DataRecord] = + magicRecsClientEventProducer + .flatMap { ... + (userId, logEvent) + }.leftJoin(userHistoryStoreService) + .flatMap { + case (_, (logEvent, history)) => + mkDataRecords(LogEventHistoryPair(logEvent, history)) + } + +.. admonition:: EmailRecs use case + + EmailRecs shares the same cache as MagicRecs. They combine notification scribe data with email history data to identify the particular item a user engaged with in an email: + + .. code-block:: scala + + val emailHistoryStoreService: Storm#Service[Long, History] = + Storm.service(EmailHistoryReadableStore) + + val emailEventDataRecordProducer: Producer[Storm, DataRecord] = + emailEventProducer + .flatMap { ... + (userId, logEvent) + }.leftJoin(emailHistoryStoreService) + .flatMap { + case (_, (scribe, history)) => + mkDataRecords(ScribeHistoryPair(scribe, history)) + } + + +Aggregation config +------------------ + +The real-time aggregation config is extended from `OnlineAggregationConfigTrait `_ and defines the features to aggregate and the backing memcached store to which they will be written. + +Setting up real-time aggregates follows the same rules as in the offline batch use case. The major difference here is that `inputSource` should point to the `StormAggregateSource` implementation that provides the `DataRecord` containing the engagements and core features on which to aggregate. In the offline case, this would have been an `OfflineAggregateSource` pointing to an offline source of daily records. + +Finally, `RealTimeAggregateStore` defines the backing memcache to be used and should be provided here as the `outputStore`. + +.. NOTE:: + + Please make sure to provide an `AggregateGroup` for both staging and production. The main difference should be the `outputStore` where features in either environment are read from and written to. You want to make sure that a staged real-time aggregates summingbird job is reading/writing only to the test memcache store and does not mutate the production store. + +Job config +---------- + +In addition to the aggregation config that defines the features to aggregate, the final piece we need to provide is a `RealTimeAggregatesJobConfig` that specificies job values such as `appId`, `teamName` and counts for the various topology components that define the capacity of the job (`Timelines example `_). + +Once you have the job config, implementing the storm job itself is easy and almost as concise as in the batch use case: + +.. code-block:: scala + + object TimelinesRealTimeAggregatesJob extends RealTimeAggregatesJobBase { + override lazy val statsReceiver = DefaultStatsReceiver.scope("timelines_real_time_aggregates") + override lazy val jobConfigs = TimelinesRealTimeAggregatesJobConfigs + override lazy val aggregatesToCompute = TimelinesOnlineAggregationConfig.AggregatesToCompute + } + +.. NOTE:: + There are some topology settings that are currently hard-coded. In particular, we enable `Config.TOPOLOGY_DROPTUPLES_UPON_BACKPRESSURE` to be true for added robustness. This may be made user-definable in the future. + +Steps to hydrate RTAs +-------------------- +1. Make the changes to RTAs and follow the steps for `Running the topology`. +2. Register the new RTAs to feature store. Sample phab: https://phabricator.twitter.biz/D718120 +3. Wire the features from feature store to TLX. This is usually done with the feature switch set to False. So it's just a code change and will not yet start hydrating the features yet. Merge the phab. Sample phab: https://phabricator.twitter.biz/D718424 +4. Now we hydrate the features to TLX gradually by doing it shard wise. For this, first create a PCM and then enable the hydration. Sample PCM: https://jira.twitter.biz/browse/PCM-147814 + +Running the topology +-------------------- +0. For phab that makes change to the topology (such as adding new ML features), before landing the phab, please create a PCM (`example `_) and deploy the change to devel topology first and then prod (atla and pdxa). Once it is confirmed that the prod topology can handle the change, the phab can be landed. +1. Go to https://ci.twitter.biz/job/tq-ci/build +2. In `commands` input + +.. code-block:: bash + + . src/scala/com/twitter/timelines/prediction/common/aggregates/real_time/deploy_local.sh [devel|atla|pdxa] + +One can only deploy either `devel`, `atla` (prod atla), `pdxa` (prod pdxa) at a time. +For example, to deploy both pdxa and atla prod topologies, one needs to build/run the above steps twice, one with `pdxa` and the other with `atla`. + +The status and performance stats of the topology are found at `go/heron-ui `_. Here you can view whether the job is processing tuples, whether it is under any memory or backpressure and provides general observability. + +Finally, since we enable `Config.TOPOLOGY_DROPTUPLES_UPON_BACKPRESSURE` by default in the topology, we also need to monitor and alert on the number of dropped tuples. Since this is a job generating features a small fraction of dropped tuples is tolerable if that enables us to avoid backpressure that would hold up global computation in the entire graph. + +Hydrating Real-Time Aggregate Features +-------------------------------------- + +Once the job is up and running, the aggregate features will be accessible in the backing memcached store. To access these features and hydrate to your online pipeline, we need to build a Memcache client with the right query key. + +.. admonition:: Example + + Some care needs to be taken to define the key injection and codec correctly for the memcached store. These types do not change and you can use the Timelines `memcache client builder `_ as an example. + +Aggregate features are written to store with a `(AggregationKey, BatchID)` key. + +`AggregationKey `_ is an instant of the keys that you previously defined in `AggregateGroup`. If your aggregation key is `USER_ID`, you would need to instantiate `AggregationKey` with the `USER_ID` featureId and the userId value. + +.. admonition:: Returned features + + The `DataRecord` that is returned by the cache now contains all real-time aggregate features for the query `AggregationKey` (similar to the batch use case). If your online hydration flow produces data records, the real-time aggregate features can be joined with your existing records in a straightforward way. + +Adding features from Feature Store to RTA +-------------------------------------------- +To add features from Feature Store to RTA and create real time aggregated features based on them, one needs to follow these steps: + +**Step 1** + +Copy Strato column for features that one wants to explore and add a cache if needed. See details at `Customize any Columns for your Team as Needed `_. As an `example `_, we copy Strato column of recommendationsUserFeaturesProd.User.strato and add a cache for timelines team's usage. + +**Step 2** + +Create a new ReadableStore which uses Feature Store Client to request features from Feature Store. Implement FeaturesAdapter which extends TimelinesAdapterBase and derive new features based on raw features from Feature Store. As an `example `_, we create UserFeaturesReadableStore which reads discrete feature user state, and convert it to a list of boolean user state features. + +**Step 3** + +Join these derived features from Feature Store to timelines storm aggregate source. Depends on the characteristic of these derived features, joined key could be tweet id, user id or others. As an `example `_, because user state is per user, the joined key is user id. + +**Step 4** + +Define `AggregateGroup` based on derived features in RTA + +Adding New Aggregate Features from an Existing Dataset +-------------------------------- +To add a new aggregate feature group from an existing dataset for use in home models, use the following steps: + +1. Identify the hypothesis being tested by the addition of the features, in accordance with `go/tpfeatureguide `_. +2. Modify or add a new AggregateGroup to `TimelinesOnlineAggregationConfigBase.scala `_ to define the aggregation key, set of features, labels and metrics. An example phab to add more halflives can be found at `D204415 `_. +3. If the change is expected to be very large, it may be recommended to perform capacity estimation. See :ref:`Capacity Estimation` for more details. +4. Create feature catalog items for the new RTAs. An example phab is `D706348 `_. For approval from a featurestore owner ping #help-ml-features on slack. +5. Add new features to the featurestore. An example phab is `D706112 `_. This change can be rolled out with feature switches or by canarying TLX, depending on the risk. An example PCM for feature switches is: `PCM-148654 `_. An example PCM for canarying is: `PCM-145753 `_. +6. Wait for redeploy and confirm the new features are available. One way is querying in BigQuery from a table like `twitter-bq-timelines-prod.continuous_training_recap_fav`. Another way is to inspect individual records using pcat. The command to be used is like: + +.. code-block:: bash + + java -cp pcat-deploy.jar:$(hadoop classpath) com.twitter.ml.tool.pcat.PredictionCatTool + -path /atla/proc2/user/timelines/processed/suggests/recap/continuous_training_data_records/fav/data/YYYY/MM/DD/01/part-00000.lzo + -fc /atla/proc2/user/timelines/processed/suggests/recap/continuous_training_data_records/fav/data_spec.json + -dates YYYY-MM-DDT01 -record_limit 100 | grep [feature_group] + + +7. Create a phab with the new features and test the performance of a model with them compared to a control model without them. Test offline using `Deepbird for training `_ and `RCE Hypothesis Testing `_ to test. Test online using a DDG. Some helpful instructions are available in `Serving Timelines Models `_ and the `Experiment Cookbook `_ + +Capacity Estimation +-------------------------------- +This section describes how to approximate the capacity required for a new aggregate group. It is not expected to be exact, but should give a rough estimate. + +There are two main components that must be stored for each aggregate group. + +Key space: Each AggregationKey struct consists of two maps, one of which is populated with tuples [Long, Long] representing of discrete features. This takes up 4 x 8 bytes or 32 bytes. The cache team estimates an additional 40 bytes of overhead. + +Features: An aggregate feature is represented as a pair (16 bytes) and is produced for each feature x label x metric x halflife combination. + +1. Use bigquery to estimate how many unique values exist for the selected key (key_count). Also collect the number of features, labels, metrics, and half-lives being used. +2. Compute the number of entries to be created, which is num_entires = feature_count * label_count * metric_count * halflife_count +3. Compute the number of bytes per entry, which is num_entry_bytes = 16*num_entries + 32 bytes (key storage) + 40 bytes (overhead) +4. Compute total space required = num_entry_bytes * key_count + +Debugging New Aggregate Features +-------------------------------- + +To debug problems in the setup of your job, there are several steps you can take. + +First, ensure that data is being received from the input stream and passed through to create data records. This can be achieved by logging results at various places in your code, and especially at the point of data record creation. + +For example, suppose you want to ensure that a data record is being created with +the features you expect. With push and email features, we find that data records +are created in the adaptor, using logic like the following: + +.. code-block:: scala + + val record = new SRichDataRecord(new DataRecord) + ... + record.setFeatureValue(feature, value) + +To see what these feature values look like, we can have our adaptor class extend +Twitter's `Logging` trait, and write each created record to a log file. + +.. code-block:: scala + + class MyEventAdaptor extends TimelinesAdapterBase[MyObject] with Logging { + ... + ... + def mkDataRecord(myFeatures: MyFeatures): DataRecord = { + val record = new SRichDataRecord(new DataRecord) + ... + record.setFeatureValue(feature, value) + logger.info("data record xyz: " + record.getRecord.toString) + } + +This way, every time a data record is sent to the aggregator, it will also be +logged. To inspect these logs, you can push these changes to a staging instance, +ssh into that aurora instance, and grep the `log-files` directory for `xyz`. The +data record objects you find should resemble a map from feature ids to their +values. + +To check that steps in the aggregation are being performed, you can also inspect the job's topology on go/heronui. + +Lastly, to verify that values are being written to your cache you can check the `set` chart in your cache's viz. + +To check particular feature values for a given key, you can spin up a Scala REPL like so: + +.. code-block:: bash + + $ ssh -fN -L*:2181:sdzookeeper-read.atla.twitter.com:2181 -D *:50001 nest.atlc.twitter.com + + $ ./pants repl --jvm-repl-scala-options='-DsocksProxyHost=localhost -DsocksProxyPort=50001 -Dcom.twitter.server.resolverZkHosts=localhost:2181' timelinemixer/common/src/main/scala/com/twitter/timelinemixer/clients/real_time_aggregates_cache + +You will then need to create a connection to the cache, and a key with which to query it. + +.. code-block:: scala + + import com.twitter.conversions.DurationOps._ + import com.twitter.finagle.stats.{DefaultStatsReceiver, StatsReceiver} + import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey + import com.twitter.summingbird.batch.Batcher + import com.twitter.timelinemixer.clients.real_time_aggregates_cache.RealTimeAggregatesMemcacheBuilder + import com.twitter.timelines.clients.memcache_common.StorehausMemcacheConfig + + val userFeature = -1887718638306251279L // feature id corresponding to User feature + val userId = 12L // replace with a user id logged when creating your data record + val key = (AggregationKey(Map(userFeature -> userId), Map.empty), Batcher.unit.currentBatch) + + val dataset = "twemcache_magicrecs_real_time_aggregates_cache_staging" // replace with the appropriate cache name + val dest = s"/srv#/test/local/cache/twemcache_/$dataset" + + val statsReceiver: StatsReceiver = DefaultStatsReceiver + val cache = new RealTimeAggregatesMemcacheBuilder( + config = StorehausMemcacheConfig( + destName = dest, + keyPrefix = "", + requestTimeout = 10.seconds, + numTries = 1, + globalTimeout = 10.seconds, + tcpConnectTimeout = 10.seconds, + connectionAcquisitionTimeout = 10.seconds, + numPendingRequests = 250, + isReadOnly = true + ), + statsReceiver.scope(dataset) + ).build + + val result = cache.get(key) + +Another option is to create a debugger which points to the staging cache and creates a cache connection and key similar to the logic above. + +Run CQL query to find metrics/counters +-------------------------------- +We can also visualize the counters from our job to verify new features. Run CQL query on terminal to find the right path of metrics/counters. For example, in order to check counter mergeNumFeatures, run: + +cql -z atla keys heron/summingbird_timelines_real_time_aggregates Tail-FlatMap | grep mergeNumFeatures + + +Then use the right path to create the viz, example: https://monitoring.twitter.biz/tiny/2552105 diff --git a/timelines/data_processing/ml_util/aggregation_framework/docs/troubleshooting.rst b/timelines/data_processing/ml_util/aggregation_framework/docs/troubleshooting.rst new file mode 100644 index 000000000..d9799f433 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/docs/troubleshooting.rst @@ -0,0 +1,117 @@ +.. _troubleshooting: + +TroubleShooting +================== + + +[Batch] Regenerating a corrupt version +-------------------------------------- + +Symptom +~~~~~~~~~~ +The Summingbird batch job failed due to the following error: + +.. code:: bash + + Caused by: com.twitter.bijection.InversionFailure: ... + +It typically indicates the corrupt records of the aggregate store (not the other side of the DataRecord source). +The following describes the method to re-generate the required (typically the latest) version: + +Solution +~~~~~~~~~~ +1. Copy **the second to last version** of the problematic data to canaries folder. For example, if 11/20's job keeps failing, then copy the 11/19's data. + +.. code:: bash + + $ hadoop --config /etc/hadoop/hadoop-conf-proc2-atla/ \ + distcp -m 1000 \ + /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates/1605744000000 \ + /atla/proc2/user/timelines/canaries/processed/aggregates_v2/user_mention_aggregates/1605744000000 + + +2. Setup canary run for the date of the problem with fallback path pointing to `1605744000000` in the prod/canaries folder. + +3. Deschedule the production job and kill the current run: + +For example, + +.. code:: bash + + $ aurora cron deschedule atla/timelines/prod/user_mention_aggregates + $ aurora job killall atla/timelines/prod/user_mention_aggregates + +4. Create backup folder and move the corrupt prod store output there + +.. code:: bash + + $ hdfs dfs -mkdir /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup + $ hdfs dfs -mv /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates/1605830400000 /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup/ + $ hadoop fs -count /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup/1605830400000 + + 1 1001 10829136677614 /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup/1605830400000 + + +5. Copy canary output store to prod folder: + +.. code:: bash + + $ hadoop --config /etc/hadoop/hadoop-conf-proc2-atla/ distcp -m 1000 /atla/proc2/user/timelines/canaries/processed/aggregates_v2/user_mention_aggregates/1605830400000 /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates/1605830400000 + +We can see the slight difference of size: + +.. code:: bash + + $ hadoop fs -count /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup/1605830400000 + 1 1001 10829136677614 /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates_backup/1605830400000 + $ hadoop fs -count /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates/1605830400000 + 1 1001 10829136677844 /atla/proc2/user/timelines/processed/aggregates_v2/user_mention_aggregates/1605830400000 + +6. Deploy prod job again and observe whether it can successfully process the new output for the date of interest. + +7. Verify the new run succeeded and job is unblocked. + +Example +~~~~~~~~ + +There is an example in https://phabricator.twitter.biz/D591174 + + +[Batch] Skipping the offline job ahead +--------------------------------------- + +Symptom +~~~~~~~~~~ +The Summingbird batch job keeps failing and the DataRecord source is no longer available (e.g. due to retention) and there is no way for the job succeed **OR** + +.. +The job is stuck processing old data (more than one week old) and it will not catch up to the new data on its own if it is left alone + +Solution +~~~~~~~~ + +We will need to skip the job ahead. Unfortunately, this involves manual effort. We also need help from the ADP team (Slack #adp). + +1. Ask the ADP team to manually insert an entry into the store via the #adp Slack channel. You may refer to https://jira.twitter.biz/browse/AIPIPE-7520 and https://jira.twitter.biz/browse/AIPIPE-9300 as references. However, please don't create and assign tickets directly to an ADP team member unless they ask you to. + +2. Copy the latest version of the store to the same HDFS directory but with a different destination name. The name MUST be the same as the above inserted version. + +For example, if the ADP team manually inserted a version on 12/09/2020, then we can see the version by running + +.. code:: bash + + $ dalv2 segment list --name user_original_author_aggregates --role timelines --location-name proc2-atla --location-type hadoop-cluster + ... + None 2020-12-09T00:00:00Z viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_original_author_aggregates/1607472000000 Unknown None + +where `1607472000000` is the timestamp of 12/09/2020. +Then you will need to duplicate the latest version of the store to a dir of `1607472000000`. +For example, + +.. code:: bash + + $ hadoop --config /etc/hadoop/hadoop-conf-proc2-atla/ distcp -m 1000 /atla/proc2/user/timelines/processed/aggregates_v2/user_original_author_aggregates/1605052800000 /atla/proc2/user/timelines/processed/aggregates_v2/user_original_author_aggregates/1607472000000 + +3. Go to the EagleEye UI of the job and click on the "Skip Ahead" button to the desired datetime. In our example, it should be `2020-12-09 12am` + +4. Wait for the job to start. Now the job should be running the 2020-12-09 partition. diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/BUILD b/timelines/data_processing/ml_util/aggregation_framework/heron/BUILD new file mode 100644 index 000000000..0cc576e4e --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/BUILD @@ -0,0 +1,74 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + strict_deps = False, + tags = ["bazel-compatible"], + dependencies = [ + ":configs", + "3rdparty/jvm/storm:heron-oss-storm", + "3rdparty/src/jvm/com/twitter/scalding:args", + "3rdparty/src/jvm/com/twitter/summingbird:storm", + "src/java/com/twitter/heron/util", + "src/java/com/twitter/ml", + "src/scala/com/twitter/storehaus_internal/nighthawk_kv", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/scala/com/twitter/summingbird_internal/runner/common", + "src/scala/com/twitter/summingbird_internal/runner/storm", + "src/scala/com/twitter/timelines/prediction/features/common", + "timelines/data_processing/ml_util/aggregation_framework:user_job", + ], +) + +scala_library( + name = "configs", + sources = [ + "NighthawkUnderlyingStoreConfig.scala", + "OnlineAggregationConfigTrait.scala", + "OnlineAggregationStoresTrait.scala", + "RealTimeAggregateStore.scala", + "RealTimeAggregatesJobConfig.scala", + "StormAggregateSource.scala", + ], + platform = "java8", + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + ":base-config", + "3rdparty/jvm/storm:heron-oss-storm", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "3rdparty/src/jvm/com/twitter/summingbird:storm", + "finagle/finagle-core/src/main", + "src/java/com/twitter/ml/api:api-base", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/memcache/config", + "src/scala/com/twitter/storehaus_internal/nighthawk_kv", + "src/scala/com/twitter/storehaus_internal/nighthawk_kv/config", + "src/scala/com/twitter/storehaus_internal/online", + "src/scala/com/twitter/storehaus_internal/store", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + "src/thrift/com/twitter/clientapp/gen:clientapp-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:data-scala", + "src/thrift/com/twitter/ml/api:feature_context-java", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/transforms", + "util/util-core:scala", + "util/util-core:util-core-util", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "base-config", + sources = [ + "OnlineAggregationConfigTrait.scala", + ], + platform = "java8", + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + "src/java/com/twitter/ml/api:api-base", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/NighthawkUnderlyingStoreConfig.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/NighthawkUnderlyingStoreConfig.scala new file mode 100644 index 000000000..cf7668a20 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/NighthawkUnderlyingStoreConfig.scala @@ -0,0 +1,31 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.mtls.authentication.EmptyServiceIdentifier +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.ssl.OpportunisticTls +import com.twitter.storehaus_internal.nighthawk_kv.CacheClientNighthawkConfig +import com.twitter.storehaus_internal.util.TTL +import com.twitter.storehaus_internal.util.TableName +import com.twitter.summingbird_internal.runner.store_config.OnlineStoreOnlyConfig +import com.twitter.util.Duration + +case class NighthawkUnderlyingStoreConfig( + serversetPath: String = "", + tableName: String = "", + cacheTTL: Duration = 1.day) + extends OnlineStoreOnlyConfig[CacheClientNighthawkConfig] { + + def online: CacheClientNighthawkConfig = online(EmptyServiceIdentifier) + + def online( + serviceIdentifier: ServiceIdentifier = EmptyServiceIdentifier + ): CacheClientNighthawkConfig = + CacheClientNighthawkConfig( + serversetPath, + TableName(tableName), + TTL(cacheTTL), + serviceIdentifier = serviceIdentifier, + opportunisticTlsLevel = OpportunisticTls.Required + ) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationConfigTrait.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationConfigTrait.scala new file mode 100644 index 000000000..aea649128 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationConfigTrait.scala @@ -0,0 +1,28 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.ml.api.Feature + +trait OnlineAggregationConfigTrait { + def ProdAggregates: Set[TypedAggregateGroup[_]] + def StagingAggregates: Set[TypedAggregateGroup[_]] + def ProdCommonAggregates: Set[TypedAggregateGroup[_]] + + /** + * AggregateToCompute: This defines the complete set of aggregates to be + * computed by the aggregation job and to be stored in memcache. + */ + def AggregatesToCompute: Set[TypedAggregateGroup[_]] + + /** + * ProdFeatures: This defines the subset of aggregates to be extracted + * and hydrated (or adapted) by callers to the aggregates features cache. + * This should only contain production aggregates and aggregates on + * product specific engagements. + * ProdCommonFeatures: Similar to ProdFeatures but containing user-level + * aggregate features. This is provided to PredictionService just + * once per user. + */ + lazy val ProdFeatures: Set[Feature[_]] = ProdAggregates.flatMap(_.allOutputFeatures) + lazy val ProdCommonFeatures: Set[Feature[_]] = ProdCommonAggregates.flatMap(_.allOutputFeatures) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationStoresTrait.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationStoresTrait.scala new file mode 100644 index 000000000..4f693190e --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/OnlineAggregationStoresTrait.scala @@ -0,0 +1,6 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +trait OnlineAggregationStoresTrait { + def ProductionStore: RealTimeAggregateStore + def StagingStore: RealTimeAggregateStore +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregateStore.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregateStore.scala new file mode 100644 index 000000000..2e75039d3 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregateStore.scala @@ -0,0 +1,50 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.mtls.authentication.EmptyServiceIdentifier +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.storehaus_internal.memcache.ConnectionConfig +import com.twitter.storehaus_internal.memcache.MemcacheConfig +import com.twitter.storehaus_internal.util.KeyPrefix +import com.twitter.storehaus_internal.util.TTL +import com.twitter.storehaus_internal.util.ZkEndPoint +import com.twitter.summingbird_internal.runner.store_config.OnlineStoreOnlyConfig +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateStore +import com.twitter.util.Duration + +object RealTimeAggregateStore { + val twCacheWilyPrefix = "/srv#" // s2s is only supported for wily path + + def makeEndpoint( + memcacheDataSet: String, + isProd: Boolean, + twCacheWilyPrefix: String = twCacheWilyPrefix + ): String = { + val env = if (isProd) "prod" else "test" + s"$twCacheWilyPrefix/$env/local/cache/$memcacheDataSet" + } +} + +case class RealTimeAggregateStore( + memcacheDataSet: String, + isProd: Boolean = false, + cacheTTL: Duration = 1.day) + extends OnlineStoreOnlyConfig[MemcacheConfig] + with AggregateStore { + import RealTimeAggregateStore._ + + override val name: String = "" + val storeKeyPrefix: KeyPrefix = KeyPrefix(name) + val memcacheZkEndPoint: String = makeEndpoint(memcacheDataSet, isProd) + + def online: MemcacheConfig = online(serviceIdentifier = EmptyServiceIdentifier) + + def online(serviceIdentifier: ServiceIdentifier = EmptyServiceIdentifier): MemcacheConfig = + new MemcacheConfig { + val endpoint = ZkEndPoint(memcacheZkEndPoint) + override val connectionConfig = + ConnectionConfig(endpoint, serviceIdentifier = serviceIdentifier) + override val keyPrefix = storeKeyPrefix + override val ttl = TTL(Duration.fromMilliseconds(cacheTTL.inMillis)) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobBase.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobBase.scala new file mode 100644 index 000000000..906f7c1be --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobBase.scala @@ -0,0 +1,301 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.algebird.Monoid +import com.twitter.bijection.Injection +import com.twitter.bijection.thrift.CompactThriftCodec +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.mtls.authentication.EmptyServiceIdentifier +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.heron.util.CommonMetric +import com.twitter.ml.api.DataRecord +import com.twitter.scalding.Args +import com.twitter.storehaus.algebra.MergeableStore +import com.twitter.storehaus.algebra.StoreAlgebra._ +import com.twitter.storehaus_internal.memcache.Memcache +import com.twitter.storehaus_internal.store.CombinedStore +import com.twitter.storehaus_internal.store.ReplicatingWritableStore +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird.batch.Batcher +import com.twitter.summingbird.online.MergeableStoreFactory +import com.twitter.summingbird.online.option._ +import com.twitter.summingbird.option.CacheSize +import com.twitter.summingbird.option.JobId +import com.twitter.summingbird.storm.option.FlatMapStormMetrics +import com.twitter.summingbird.storm.option.SummerStormMetrics +import com.twitter.summingbird.storm.Storm +import com.twitter.summingbird.storm.StormMetric +import com.twitter.summingbird.Options +import com.twitter.summingbird._ +import com.twitter.summingbird_internal.runner.common.CapTicket +import com.twitter.summingbird_internal.runner.common.JobName +import com.twitter.summingbird_internal.runner.common.TeamEmail +import com.twitter.summingbird_internal.runner.common.TeamName +import com.twitter.summingbird_internal.runner.storm.ProductionStormConfig +import com.twitter.timelines.data_processing.ml_util.aggregation_framework._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.job.AggregatesV2Job +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.job.AggregatesV2Job +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.job.DataRecordFeatureCounter +import org.apache.heron.api.{Config => HeronConfig} +import org.apache.heron.common.basics.ByteAmount +import org.apache.storm.Config +import scala.collection.JavaConverters._ + +object RealTimeAggregatesJobBase { + lazy val commonMetric: StormMetric[CommonMetric] = + StormMetric(new CommonMetric(), CommonMetric.NAME, CommonMetric.POLL_INTERVAL) + lazy val flatMapMetrics: FlatMapStormMetrics = FlatMapStormMetrics(Iterable(commonMetric)) + lazy val summerMetrics: SummerStormMetrics = SummerStormMetrics(Iterable(commonMetric)) +} + +trait RealTimeAggregatesJobBase extends Serializable { + import RealTimeAggregatesJobBase._ + import com.twitter.summingbird_internal.bijection.BatchPairImplicits._ + + def statsReceiver: StatsReceiver + + def aggregatesToCompute: Set[TypedAggregateGroup[_]] + + def jobConfigs: RealTimeAggregatesJobConfigs + + implicit lazy val dataRecordCodec: Injection[DataRecord, Array[Byte]] = + CompactThriftCodec[DataRecord] + implicit lazy val monoid: Monoid[DataRecord] = DataRecordAggregationMonoid(aggregatesToCompute) + implicit lazy val aggregationKeyInjection: Injection[AggregationKey, Array[Byte]] = + AggregationKeyInjection + + val clusters: Set[String] = Set("atla", "pdxa") + + def buildAggregateStoreToStorm( + isProd: Boolean, + serviceIdentifier: ServiceIdentifier, + jobConfig: RealTimeAggregatesJobConfig + ): (AggregateStore => Option[Storm#Store[AggregationKey, DataRecord]]) = { + (store: AggregateStore) => + store match { + case rtaStore: RealTimeAggregateStore if rtaStore.isProd == isProd => { + lazy val primaryStore: MergeableStore[(AggregationKey, BatchID), DataRecord] = + Memcache.getMemcacheStore[(AggregationKey, BatchID), DataRecord]( + rtaStore.online(serviceIdentifier)) + + lazy val mergeableStore: MergeableStore[(AggregationKey, BatchID), DataRecord] = + if (jobConfig.enableUserReindexingNighthawkBtreeStore + || jobConfig.enableUserReindexingNighthawkHashStore) { + val reindexingNighthawkBtreeWritableDataRecordStoreList = + if (jobConfig.enableUserReindexingNighthawkBtreeStore) { + lazy val cacheClientNighthawkConfig = + jobConfig.userReindexingNighthawkBtreeStoreConfig.online(serviceIdentifier) + List( + UserReindexingNighthawkWritableDataRecordStore.getBtreeStore( + nighthawkCacheConfig = cacheClientNighthawkConfig, + // Choose a reasonably large target size as this will be equivalent to the number of unique (user, timestamp) + // keys that are returned on read on the pKey, and we may have duplicate authors and associated records. + targetSize = 512, + statsReceiver = statsReceiver, + // Assuming trims are relatively expensive, choose a trimRate that's not as aggressive. In this case we trim on + // 10% of all writes. + trimRate = 0.1 + )) + } else { Nil } + val reindexingNighthawkHashWritableDataRecordStoreList = + if (jobConfig.enableUserReindexingNighthawkHashStore) { + lazy val cacheClientNighthawkConfig = + jobConfig.userReindexingNighthawkHashStoreConfig.online(serviceIdentifier) + List( + UserReindexingNighthawkWritableDataRecordStore.getHashStore( + nighthawkCacheConfig = cacheClientNighthawkConfig, + // Choose a reasonably large target size as this will be equivalent to the number of unique (user, timestamp) + // keys that are returned on read on the pKey, and we may have duplicate authors and associated records. + targetSize = 512, + statsReceiver = statsReceiver, + // Assuming trims are relatively expensive, choose a trimRate that's not as aggressive. In this case we trim on + // 10% of all writes. + trimRate = 0.1 + )) + } else { Nil } + + lazy val replicatingWritableStore = new ReplicatingWritableStore( + stores = List(primaryStore) ++ reindexingNighthawkBtreeWritableDataRecordStoreList + ++ reindexingNighthawkHashWritableDataRecordStoreList + ) + + lazy val combinedStoreWithReindexing = new CombinedStore( + read = primaryStore, + write = replicatingWritableStore + ) + + combinedStoreWithReindexing.toMergeable + } else { + primaryStore + } + + lazy val storeFactory: MergeableStoreFactory[(AggregationKey, BatchID), DataRecord] = + Storm.store(mergeableStore)(Batcher.unit) + Some(storeFactory) + } + case _ => None + } + } + + def buildDataRecordSourceToStorm( + jobConfig: RealTimeAggregatesJobConfig + ): (AggregateSource => Option[Producer[Storm, DataRecord]]) = { (source: AggregateSource) => + { + source match { + case stormAggregateSource: StormAggregateSource => + Some(stormAggregateSource.build(statsReceiver, jobConfig)) + case _ => None + } + } + } + + def apply(args: Args): ProductionStormConfig = { + lazy val isProd = args.boolean("production") + lazy val cluster = args.getOrElse("cluster", "") + lazy val isDebug = args.boolean("debug") + lazy val role = args.getOrElse("role", "") + lazy val service = + args.getOrElse( + "service_name", + "" + ) // don't use the argument service, which is a reserved heron argument + lazy val environment = if (isProd) "prod" else "devel" + lazy val s2sEnabled = args.boolean("s2s") + lazy val keyedByUserEnabled = args.boolean("keyed_by_user") + lazy val keyedByAuthorEnabled = args.boolean("keyed_by_author") + + require(clusters.contains(cluster)) + if (s2sEnabled) { + require(role.length() > 0) + require(service.length() > 0) + } + + lazy val serviceIdentifier = if (s2sEnabled) { + ServiceIdentifier( + role = role, + service = service, + environment = environment, + zone = cluster + ) + } else EmptyServiceIdentifier + + lazy val jobConfig = { + val jobConfig = if (isProd) jobConfigs.Prod else jobConfigs.Devel + jobConfig.copy( + serviceIdentifier = serviceIdentifier, + keyedByUserEnabled = keyedByUserEnabled, + keyedByAuthorEnabled = keyedByAuthorEnabled) + } + + lazy val dataRecordSourceToStorm = buildDataRecordSourceToStorm(jobConfig) + lazy val aggregateStoreToStorm = + buildAggregateStoreToStorm(isProd, serviceIdentifier, jobConfig) + + lazy val JaasConfigFlag = "-Djava.security.auth.login.config=resources/jaas.conf" + lazy val JaasDebugFlag = "-Dsun.security.krb5.debug=true" + lazy val JaasConfigString = + if (isDebug) { "%s %s".format(JaasConfigFlag, JaasDebugFlag) } + else JaasConfigFlag + + new ProductionStormConfig { + implicit val jobId: JobId = JobId(jobConfig.name) + override val jobName = JobName(jobConfig.name) + override val teamName = TeamName(jobConfig.teamName) + override val teamEmail = TeamEmail(jobConfig.teamEmail) + override val capTicket = CapTicket("n/a") + + val configureHeronJvmSettings = { + val heronJvmOptions = new java.util.HashMap[String, AnyRef]() + jobConfig.componentToRamGigaBytesMap.foreach { + case (component, gigabytes) => + HeronConfig.setComponentRam( + heronJvmOptions, + component, + ByteAmount.fromGigabytes(gigabytes)) + } + + HeronConfig.setContainerRamRequested( + heronJvmOptions, + ByteAmount.fromGigabytes(jobConfig.containerRamGigaBytes) + ) + + jobConfig.componentsToKerberize.foreach { component => + HeronConfig.setComponentJvmOptions( + heronJvmOptions, + component, + JaasConfigString + ) + } + + jobConfig.componentToMetaSpaceSizeMap.foreach { + case (component, metaspaceSize) => + HeronConfig.setComponentJvmOptions( + heronJvmOptions, + component, + metaspaceSize + ) + } + + heronJvmOptions.asScala.toMap ++ AggregatesV2Job + .aggregateNames(aggregatesToCompute).map { + case (prefix, aggNames) => (s"extras.aggregateNames.${prefix}", aggNames) + } + } + + override def transformConfig(m: Map[String, AnyRef]): Map[String, AnyRef] = { + super.transformConfig(m) ++ List( + /** + * Disable acking by setting acker executors to 0. Tuples that come off the + * spout will be immediately acked which effectively disables retries on tuple + * failures. This should help topology throughput/availability by relaxing consistency. + */ + Config.TOPOLOGY_ACKER_EXECUTORS -> int2Integer(0), + Config.TOPOLOGY_WORKERS -> int2Integer(jobConfig.topologyWorkers), + HeronConfig.TOPOLOGY_CONTAINER_CPU_REQUESTED -> int2Integer(8), + HeronConfig.TOPOLOGY_DROPTUPLES_UPON_BACKPRESSURE -> java.lang.Boolean.valueOf(true), + HeronConfig.TOPOLOGY_WORKER_CHILDOPTS -> List( + JaasConfigString, + s"-Dcom.twitter.eventbus.client.zoneName=${cluster}", + "-Dcom.twitter.eventbus.client.EnableKafkaSaslTls=true" + ).mkString(" "), + "storm.job.uniqueId" -> jobId.get + ) ++ configureHeronJvmSettings + + } + + override lazy val getNamedOptions: Map[String, Options] = jobConfig.topologyNamedOptions ++ + Map( + "DEFAULT" -> Options() + .set(flatMapMetrics) + .set(summerMetrics) + .set(MaxWaitingFutures(1000)) + .set(FlushFrequency(30.seconds)) + .set(UseAsyncCache(true)) + .set(AsyncPoolSize(4)) + .set(SourceParallelism(jobConfig.sourceCount)) + .set(SummerBatchMultiplier(1000)), + "FLATMAP" -> Options() + .set(FlatMapParallelism(jobConfig.flatMapCount)) + .set(CacheSize(0)), + "SUMMER" -> Options() + .set(SummerParallelism(jobConfig.summerCount)) + /** + * Sets number of tuples a Summer awaits before aggregation. Set higher + * if you need to lower qps to memcache at the expense of introducing + * some (stable) latency. + */ + .set(CacheSize(jobConfig.cacheSize)) + ) + + val featureCounters: Seq[DataRecordFeatureCounter] = + Seq(DataRecordFeatureCounter.any(Counter(Group("feature_counter"), Name("num_records")))) + + override def graph: TailProducer[Storm, Any] = AggregatesV2Job.generateJobGraph[Storm]( + aggregateSet = aggregatesToCompute, + aggregateSourceToSummingbird = dataRecordSourceToStorm, + aggregateStoreToSummingbird = aggregateStoreToStorm, + featureCounters = featureCounters + ) + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobConfig.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobConfig.scala new file mode 100644 index 000000000..8bed26264 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/RealTimeAggregatesJobConfig.scala @@ -0,0 +1,79 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.finagle.mtls.authentication.EmptyServiceIdentifier +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.ml.api.DataRecord +import com.twitter.summingbird.Options +import com.twitter.timelines.data_processing.ml_util.transforms.OneToSomeTransform + +/** + * + * @param appId application id for topology job + * @param topologyWorkers number of workers/containers of topology + * @param sourceCount number of parallel sprouts of topology + * @param summerCount number of Summer of topology + * @param cacheSize number of tuples a Summer awaits before aggregation. + * @param flatMapCount number of parallel FlatMap of topology + * @param containerRamGigaBytes total RAM of each worker/container has + * @param name name of topology job + * @param teamName name of team who owns topology job + * @param teamEmail email of team who owns topology job + * @param componentsToKerberize component of topology job (eg. Tail-FlatMap-Source) which enables kerberization + * @param componentToMetaSpaceSizeMap MetaSpaceSize settings for components of topology job + * @param topologyNamedOptions Sets spout allocations for named topology components + * @param serviceIdentifier represents the identifier used for Service to Service Authentication + * @param onlinePreTransforms sequential data record transforms applied to Producer of DataRecord before creating AggregateGroup. + * While preTransforms defined at AggregateGroup are applied to each aggregate group, onlinePreTransforms are applied to the whole producer source. + * @param keyedByUserEnabled boolean value to enable/disable merging user-level features from Feature Store + * @param keyedByAuthorEnabled boolean value to enable/disable merging author-level features from Feature Store + * @param enableUserReindexingNighthawkBtreeStore boolean value to enable reindexing RTAs on user id with btree backed nighthawk + * @param enableUserReindexingNighthawkHashStore boolean value to enable reindexing RTAs on user id with hash backed nighthawk + * @param userReindexingNighthawkBtreeStoreConfig NH btree store config used in reindexing user RTAs + * @param userReindexingNighthawkHashStoreConfig NH hash store config used in reindexing user RTAs + */ +case class RealTimeAggregatesJobConfig( + appId: String, + topologyWorkers: Int, + sourceCount: Int, + summerCount: Int, + cacheSize: Int, + flatMapCount: Int, + containerRamGigaBytes: Int, + name: String, + teamName: String, + teamEmail: String, + componentsToKerberize: Seq[String] = Seq.empty, + componentToMetaSpaceSizeMap: Map[String, String] = Map.empty, + componentToRamGigaBytesMap: Map[String, Int] = Map("Tail" -> 4), + topologyNamedOptions: Map[String, Options] = Map.empty, + serviceIdentifier: ServiceIdentifier = EmptyServiceIdentifier, + onlinePreTransforms: Seq[OneToSomeTransform] = Seq.empty, + keyedByUserEnabled: Boolean = false, + keyedByAuthorEnabled: Boolean = false, + keyedByTweetEnabled: Boolean = false, + enableUserReindexingNighthawkBtreeStore: Boolean = false, + enableUserReindexingNighthawkHashStore: Boolean = false, + userReindexingNighthawkBtreeStoreConfig: NighthawkUnderlyingStoreConfig = + NighthawkUnderlyingStoreConfig(), + userReindexingNighthawkHashStoreConfig: NighthawkUnderlyingStoreConfig = + NighthawkUnderlyingStoreConfig()) { + + /** + * Apply transforms sequentially. If any transform results in a dropped (None) + * DataRecord, then entire transform sequence will result in a dropped DataRecord. + * Note that transforms are order-dependent. + */ + def sequentiallyTransform(dataRecord: DataRecord): Option[DataRecord] = { + val recordOpt = Option(new DataRecord(dataRecord)) + onlinePreTransforms.foldLeft(recordOpt) { + case (Some(previousRecord), preTransform) => + preTransform(previousRecord) + case _ => Option.empty[DataRecord] + } + } +} + +trait RealTimeAggregatesJobConfigs { + def Prod: RealTimeAggregatesJobConfig + def Devel: RealTimeAggregatesJobConfig +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/StormAggregateSource.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/StormAggregateSource.scala new file mode 100644 index 000000000..a252cf197 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/StormAggregateSource.scala @@ -0,0 +1,27 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.summingbird._ +import com.twitter.summingbird.storm.Storm +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateSource +import java.lang.{Long => JLong} + +/** + * Use this trait to implement online summingbird producer that subscribes to + * spouts and generates a data record. + */ +trait StormAggregateSource extends AggregateSource { + def name: String + + def timestampFeature: Feature[JLong] + + /** + * Constructs the storm Producer with the implemented topology at runtime. + */ + def build( + statsReceiver: StatsReceiver, + jobConfig: RealTimeAggregatesJobConfig + ): Producer[Storm, DataRecord] +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/UserReindexingNighthawkStore.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/UserReindexingNighthawkStore.scala new file mode 100644 index 000000000..a4d2adeac --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/UserReindexingNighthawkStore.scala @@ -0,0 +1,309 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron + +import com.twitter.bijection.Injection +import com.twitter.bijection.thrift.CompactThriftCodec +import com.twitter.cache.client._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.storehaus.WritableStore +import com.twitter.storehaus_internal.nighthawk_kv.CacheClientNighthawkConfig +import com.twitter.storehaus_internal.nighthawk_kv.NighthawkStore +import com.twitter.summingbird.batch.BatchID +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.heron.UserReindexingNighthawkWritableDataRecordStore._ +import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures +import com.twitter.util.Future +import com.twitter.util.Time +import com.twitter.util.Try +import com.twitter.util.logging.Logger +import java.nio.ByteBuffer +import java.util +import scala.util.Random + +object UserReindexingNighthawkWritableDataRecordStore { + implicit val longInjection = Injection.long2BigEndian + implicit val dataRecordInjection: Injection[DataRecord, Array[Byte]] = + CompactThriftCodec[DataRecord] + val arrayToByteBuffer = Injection.connect[Array[Byte], ByteBuffer] + val longToByteBuffer = longInjection.andThen(arrayToByteBuffer) + val dataRecordToByteBuffer = dataRecordInjection.andThen(arrayToByteBuffer) + + def getBtreeStore( + nighthawkCacheConfig: CacheClientNighthawkConfig, + targetSize: Int, + statsReceiver: StatsReceiver, + trimRate: Double + ): UserReindexingNighthawkBtreeWritableDataRecordStore = + new UserReindexingNighthawkBtreeWritableDataRecordStore( + nighthawkStore = NighthawkStore[UserId, TimestampMs, DataRecord](nighthawkCacheConfig) + .asInstanceOf[NighthawkStore[UserId, TimestampMs, DataRecord]], + tableName = nighthawkCacheConfig.table.toString, + targetSize = targetSize, + statsReceiver = statsReceiver, + trimRate = trimRate + ) + + def getHashStore( + nighthawkCacheConfig: CacheClientNighthawkConfig, + targetSize: Int, + statsReceiver: StatsReceiver, + trimRate: Double + ): UserReindexingNighthawkHashWritableDataRecordStore = + new UserReindexingNighthawkHashWritableDataRecordStore( + nighthawkStore = NighthawkStore[UserId, AuthorId, DataRecord](nighthawkCacheConfig) + .asInstanceOf[NighthawkStore[UserId, AuthorId, DataRecord]], + tableName = nighthawkCacheConfig.table.toString, + targetSize = targetSize, + statsReceiver = statsReceiver, + trimRate = trimRate + ) + + def buildTimestampedByteBuffer(timestamp: Long, bb: ByteBuffer): ByteBuffer = { + val timestampedBb = ByteBuffer.allocate(getLength(bb) + java.lang.Long.SIZE) + timestampedBb.putLong(timestamp) + timestampedBb.put(bb) + timestampedBb + } + + def extractTimestampFromTimestampedByteBuffer(bb: ByteBuffer): Long = { + bb.getLong(0) + } + + def extractValueFromTimestampedByteBuffer(bb: ByteBuffer): ByteBuffer = { + val bytes = new Array[Byte](getLength(bb) - java.lang.Long.SIZE) + util.Arrays.copyOfRange(bytes, java.lang.Long.SIZE, getLength(bb)) + ByteBuffer.wrap(bytes) + } + + def transformAndBuildKeyValueMapping( + table: String, + userId: UserId, + authorIdsAndDataRecords: Seq[(AuthorId, DataRecord)] + ): KeyValue = { + val timestamp = Time.now.inMillis + val pkey = longToByteBuffer(userId) + val lkeysAndTimestampedValues = authorIdsAndDataRecords.map { + case (authorId, dataRecord) => + val lkey = longToByteBuffer(authorId) + // Create a byte buffer with a prepended timestamp to reduce deserialization cost + // when parsing values. We only have to extract and deserialize the timestamp in the + // ByteBuffer in order to sort the value, as opposed to deserializing the DataRecord + // and having to get a timestamp feature value from the DataRecord. + val dataRecordBb = dataRecordToByteBuffer(dataRecord) + val timestampedValue = buildTimestampedByteBuffer(timestamp, dataRecordBb) + (lkey, timestampedValue) + } + buildKeyValueMapping(table, pkey, lkeysAndTimestampedValues) + } + + def buildKeyValueMapping( + table: String, + pkey: ByteBuffer, + lkeysAndTimestampedValues: Seq[(ByteBuffer, ByteBuffer)] + ): KeyValue = { + val lkeys = lkeysAndTimestampedValues.map { case (lkey, _) => lkey } + val timestampedValues = lkeysAndTimestampedValues.map { case (_, value) => value } + val kv = KeyValue( + key = Key(table = table, pkey = pkey, lkeys = lkeys), + value = Value(timestampedValues) + ) + kv + } + + private def getLength(bb: ByteBuffer): Int = { + // capacity can be an over-estimate of the actual length (remaining - start position) + // but it's the safest to avoid overflows. + bb.capacity() + } +} + +/** + * Implements a NH store that stores aggregate feature DataRecords using userId as the primary key. + * + * This store re-indexes user-author keyed real-time aggregate (RTA) features on userId by + * writing to a userId primary key (pkey) and timestamp secondary key (lkey). To fetch user-author + * RTAs for a given user from cache, the caller just needs to make a single RPC for the userId pkey. + * The downside of a re-indexing store is that we cannot store arbitrarily many secondary keys + * under the primary key. This specific implementation using the NH btree backend also mandates + * mandates an ordering of secondary keys - we therefore use timestamp as the secondary key + * as opposed to say authorId. + * + * Note that a caller of the btree backed NH re-indexing store receives back a response where the + * secondary key is a timestamp. The associated value is a DataRecord containing user-author related + * aggregate features which was last updated at the timestamp. The caller therefore needs to handle + * the response and dedupe on unique, most recent user-author pairs. + * + * For a discussion on this and other implementations, please see: + * https://docs.google.com/document/d/1yVzAbQ_ikLqwSf230URxCJmSKj5yZr5dYv6TwBlQw18/edit + */ +class UserReindexingNighthawkBtreeWritableDataRecordStore( + nighthawkStore: NighthawkStore[UserId, TimestampMs, DataRecord], + tableName: String, + targetSize: Int, + statsReceiver: StatsReceiver, + trimRate: Double = 0.1 // by default, trim on 10% of puts +) extends WritableStore[(AggregationKey, BatchID), Option[DataRecord]] { + + private val scope = getClass.getSimpleName + private val failures = statsReceiver.counter(scope, "failures") + private val log = Logger.getLogger(getClass) + private val random: Random = new Random(1729L) + + override def put(kv: ((AggregationKey, BatchID), Option[DataRecord])): Future[Unit] = { + val ((aggregationKey, _), dataRecordOpt) = kv + // Fire-and-forget below because the store itself should just be a side effect + // as it's just making re-indexed writes based on the writes to the primary store. + for { + userId <- aggregationKey.discreteFeaturesById.get(SharedFeatures.USER_ID.getFeatureId) + dataRecord <- dataRecordOpt + } yield { + SRichDataRecord(dataRecord) + .getFeatureValueOpt(TypedAggregateGroup.timestampFeature) + .map(_.toLong) // convert to Scala Long + .map { timestamp => + val trim: Future[Unit] = if (random.nextDouble <= trimRate) { + val trimKey = TrimKey( + table = tableName, + pkey = longToByteBuffer(userId), + targetSize = targetSize, + ascending = true + ) + nighthawkStore.client.trim(Seq(trimKey)).unit + } else { + Future.Unit + } + // We should wait for trim to complete above + val fireAndForget = trim.before { + val kvTuple = ((userId, timestamp), Some(dataRecord)) + nighthawkStore.put(kvTuple) + } + + fireAndForget.onFailure { + case e => + failures.incr() + log.error("Failure in UserReindexingNighthawkHashWritableDataRecordStore", e) + } + } + } + // Ignore fire-and-forget result above and simply return + Future.Unit + } +} + +/** + * Implements a NH store that stores aggregate feature DataRecords using userId as the primary key. + * + * This store re-indexes user-author keyed real-time aggregate (RTA) features on userId by + * writing to a userId primary key (pkey) and authorId secondary key (lkey). To fetch user-author + * RTAs for a given user from cache, the caller just needs to make a single RPC for the userId pkey. + * The downside of a re-indexing store is that we cannot store arbitrarily + * many secondary keys under the primary key. We have to limit them in some way; + * here, we do so by randomly (based on trimRate) issuing an HGETALL command (via scan) to + * retrieve the whole hash, sort by oldest timestamp, and then remove the oldest authors to keep + * only targetSize authors (aka trim), where targetSize is configurable. + * + * @note The full hash returned from scan could be as large (or even larger) than targetSize, + * which could mean many DataRecords to deserialize, especially at high write qps. + * To reduce deserialization cost post-scan, we use timestamped values with a prepended timestamp + * in the value ByteBuffer; this allows us to only deserialize the timestamp and not the full + * DataRecord when sorting. This is necessary in order to identify the oldest values to trim. + * When we do a put for a new (user, author) pair, we also write out timestamped values. + * + * For a discussion on this and other implementations, please see: + * https://docs.google.com/document/d/1yVzAbQ_ikLqwSf230URxCJmSKj5yZr5dYv6TwBlQw18/edit + */ +class UserReindexingNighthawkHashWritableDataRecordStore( + nighthawkStore: NighthawkStore[UserId, AuthorId, DataRecord], + tableName: String, + targetSize: Int, + statsReceiver: StatsReceiver, + trimRate: Double = 0.1 // by default, trim on 10% of puts +) extends WritableStore[(AggregationKey, BatchID), Option[DataRecord]] { + + private val scope = getClass.getSimpleName + private val scanMismatchErrors = statsReceiver.counter(scope, "scanMismatchErrors") + private val failures = statsReceiver.counter(scope, "failures") + private val log = Logger.getLogger(getClass) + private val random: Random = new Random(1729L) + private val arrayToByteBuffer = Injection.connect[Array[Byte], ByteBuffer] + private val longToByteBuffer = Injection.long2BigEndian.andThen(arrayToByteBuffer) + + override def put(kv: ((AggregationKey, BatchID), Option[DataRecord])): Future[Unit] = { + val ((aggregationKey, _), dataRecordOpt) = kv + // Fire-and-forget below because the store itself should just be a side effect + // as it's just making re-indexed writes based on the writes to the primary store. + for { + userId <- aggregationKey.discreteFeaturesById.get(SharedFeatures.USER_ID.getFeatureId) + authorId <- aggregationKey.discreteFeaturesById.get( + TimelinesSharedFeatures.SOURCE_AUTHOR_ID.getFeatureId) + dataRecord <- dataRecordOpt + } yield { + val scanAndTrim: Future[Unit] = if (random.nextDouble <= trimRate) { + val scanKey = ScanKey( + table = tableName, + pkey = longToByteBuffer(userId) + ) + nighthawkStore.client.scan(Seq(scanKey)).flatMap { scanResults: Seq[Try[KeyValue]] => + scanResults.headOption + .flatMap(_.toOption).map { keyValue: KeyValue => + val lkeys: Seq[ByteBuffer] = keyValue.key.lkeys + // these are timestamped bytebuffers + val timestampedValues: Seq[ByteBuffer] = keyValue.value.values + // this should fail loudly if this is not true. it would indicate + // there is a mistake in the scan. + if (lkeys.size != timestampedValues.size) scanMismatchErrors.incr() + assert(lkeys.size == timestampedValues.size) + if (lkeys.size > targetSize) { + val numToRemove = targetSize - lkeys.size + // sort by oldest and take top k oldest and remove - this is equivalent to a trim + val oldestKeys: Seq[ByteBuffer] = lkeys + .zip(timestampedValues) + .map { + case (lkey, timestampedValue) => + val timestamp = extractTimestampFromTimestampedByteBuffer(timestampedValue) + (timestamp, lkey) + } + .sortBy { case (timestamp, _) => timestamp } + .take(numToRemove) + .map { case (_, k) => k } + val pkey = longToByteBuffer(userId) + val key = Key(table = tableName, pkey = pkey, lkeys = oldestKeys) + // NOTE: `remove` is a batch API, and we group all lkeys into a single batch (batch + // size = single group of lkeys = 1). Instead, we could separate lkeys into smaller + // groups and have batch size = number of groups, but this is more complex. + // Performance implications of batching vs non-batching need to be assessed. + nighthawkStore.client + .remove(Seq(key)) + .map { responses => + responses.map(resp => nighthawkStore.processValue(resp)) + }.unit + } else { + Future.Unit + } + }.getOrElse(Future.Unit) + } + } else { + Future.Unit + } + // We should wait for scan and trim to complete above + val fireAndForget = scanAndTrim.before { + val kv = transformAndBuildKeyValueMapping(tableName, userId, Seq((authorId, dataRecord))) + nighthawkStore.client + .put(Seq(kv)) + .map { responses => + responses.map(resp => nighthawkStore.processValue(resp)) + }.unit + } + fireAndForget.onFailure { + case e => + failures.incr() + log.error("Failure in UserReindexingNighthawkHashWritableDataRecordStore", e) + } + } + // Ignore fire-and-forget result above and simply return + Future.Unit + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/heron/package.scala b/timelines/data_processing/ml_util/aggregation_framework/heron/package.scala new file mode 100644 index 000000000..e995cf202 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/heron/package.scala @@ -0,0 +1,8 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework + +package object heron { + // NOTE: please sort alphabetically + type AuthorId = Long + type UserId = Long + type TimestampMs = Long +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/job/AggregatesV2Job.scala b/timelines/data_processing/ml_util/aggregation_framework/job/AggregatesV2Job.scala new file mode 100644 index 000000000..7d9e1946e --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/job/AggregatesV2Job.scala @@ -0,0 +1,163 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.job + +import com.twitter.algebird.Semigroup +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.DataRecordMerger +import com.twitter.summingbird.Platform +import com.twitter.summingbird.Producer +import com.twitter.summingbird.TailProducer +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateSource +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateStore +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup + +object AggregatesV2Job { + private lazy val merger = new DataRecordMerger + + /** + * Merges all "incremental" records with the same aggregation key + * into a single record. + * + * @param recordsPerKey A set of (AggregationKey, DataRecord) tuples + * known to share the same AggregationKey + * @return A single merged datarecord + */ + def mergeRecords(recordsPerKey: Set[(AggregationKey, DataRecord)]): DataRecord = + recordsPerKey.foldLeft(new DataRecord) { + case (merged: DataRecord, (key: AggregationKey, elem: DataRecord)) => { + merger.merge(merged, elem) + merged + } + } + + /** + * Given a set of aggregates to compute and a datarecord, extract key-value + * pairs to output to the summingbird store. + * + * @param dataRecord input data record + * @param aggregates set of aggregates to compute + * @param featureCounters counters to apply to each input data record + * @return computed aggregates + */ + def computeAggregates( + dataRecord: DataRecord, + aggregates: Set[TypedAggregateGroup[_]], + featureCounters: Seq[DataRecordFeatureCounter] + ): Map[AggregationKey, DataRecord] = { + val computedAggregates = aggregates + .flatMap(_.computeAggregateKVPairs(dataRecord)) + .groupBy { case (aggregationKey: AggregationKey, _) => aggregationKey } + .mapValues(mergeRecords) + + featureCounters.foreach(counter => + computedAggregates.map(agg => DataRecordFeatureCounter(counter, agg._2))) + + computedAggregates + + } + + /** + * Util method to apply a filter on containment in an optional set. + * + * @param setOptional Optional set of items to check containment in. + * @param toCheck Item to check if contained in set. + * @return If the optional set is None, returns true. + */ + def setFilter[T](setOptional: Option[Set[T]], toCheck: T): Boolean = + setOptional.map(_.contains(toCheck)).getOrElse(true) + + /** + * Util for filtering a collection of `TypedAggregateGroup` + * + * @param aggregates a set of aggregates + * @param sourceNames Optional filter on which AggregateGroups to process + * based on the name of the input source. + * @param storeNames Optional filter on which AggregateGroups to process + * based on the name of the output store. + * @return filtered aggregates + */ + def filterAggregates( + aggregates: Set[TypedAggregateGroup[_]], + sourceNames: Option[Set[String]], + storeNames: Option[Set[String]] + ): Set[TypedAggregateGroup[_]] = + aggregates + .filter { aggregateGroup => + val sourceName = aggregateGroup.inputSource.name + val storeName = aggregateGroup.outputStore.name + val containsSource = setFilter(sourceNames, sourceName) + val containsStore = setFilter(storeNames, storeName) + containsSource && containsStore + } + + /** + * The core summingbird job code. + * + * For each aggregate in the set passed in, the job + * processes all datarecords in the input producer + * stream to generate "incremental" contributions to + * these aggregates, and emits them grouped by + * aggregation key so that summingbird can aggregate them. + * + * It is important that after applying the sourceNameFilter and storeNameFilter, + * all the result AggregateGroups share the same startDate, otherwise the job + * will fail or give invalid results. + * + * @param aggregateSet A set of aggregates to compute. All aggregates + * in this set that pass the sourceNameFilter and storeNameFilter + * defined below, if any, will be computed. + * @param aggregateSourceToSummingbird Function that maps from our logical + * AggregateSource abstraction to the underlying physical summingbird + * producer of data records to aggregate (e.g. scalding/eventbus source) + * @param aggregateStoreToSummingbird Function that maps from our logical + * AggregateStore abstraction to the underlying physical summingbird + * store to write output aggregate records to (e.g. mahattan for scalding, + * or memcache for heron) + * @param featureCounters counters to use with each input DataRecord + * @return summingbird tail producer + */ + def generateJobGraph[P <: Platform[P]]( + aggregateSet: Set[TypedAggregateGroup[_]], + aggregateSourceToSummingbird: AggregateSource => Option[Producer[P, DataRecord]], + aggregateStoreToSummingbird: AggregateStore => Option[P#Store[AggregationKey, DataRecord]], + featureCounters: Seq[DataRecordFeatureCounter] = Seq.empty + )( + implicit semigroup: Semigroup[DataRecord] + ): TailProducer[P, Any] = { + val tailProducerList: List[TailProducer[P, Any]] = aggregateSet + .groupBy { aggregate => (aggregate.inputSource, aggregate.outputStore) } + .flatMap { + case ( + (inputSource: AggregateSource, outputStore: AggregateStore), + aggregatesInThisStore + ) => { + val producerOpt = aggregateSourceToSummingbird(inputSource) + val storeOpt = aggregateStoreToSummingbird(outputStore) + + (producerOpt, storeOpt) match { + case (Some(producer), Some(store)) => + Some( + producer + .flatMap(computeAggregates(_, aggregatesInThisStore, featureCounters)) + .name("FLATMAP") + .sumByKey(store) + .name("SUMMER") + ) + case _ => None + } + } + } + .toList + + tailProducerList.reduceLeft { (left, right) => left.also(right) } + } + + def aggregateNames(aggregateSet: Set[TypedAggregateGroup[_]]) = { + aggregateSet + .map(typedGroup => + ( + typedGroup.aggregatePrefix, + typedGroup.individualAggregateDescriptors + .flatMap(_.outputFeatures.map(_.getFeatureName)).mkString(","))) + }.toMap +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/job/BUILD b/timelines/data_processing/ml_util/aggregation_framework/job/BUILD new file mode 100644 index 000000000..57593fa34 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/job/BUILD @@ -0,0 +1,19 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/algebird:core", + "3rdparty/jvm/com/twitter/algebird:util", + "3rdparty/jvm/com/twitter/storehaus:algebra", + "3rdparty/jvm/com/twitter/storehaus:core", + "3rdparty/src/jvm/com/twitter/scalding:commons", + "3rdparty/src/jvm/com/twitter/scalding:core", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/job/DataRecordFeatureCounter.scala b/timelines/data_processing/ml_util/aggregation_framework/job/DataRecordFeatureCounter.scala new file mode 100644 index 000000000..eb1580a11 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/job/DataRecordFeatureCounter.scala @@ -0,0 +1,39 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.job + +import com.twitter.ml.api.DataRecord +import com.twitter.summingbird.Counter + +/** + * A summingbird Counter which is associated with a predicate which operates on + * [[com.twitter.ml.api.DataRecord]] instances. + * + * For example, for a data record which represents a Tweet, one could define a predicate + * which checks whether the Tweet contains a binary feature representing the presence of + * an image. The counter can then be used to represent the the count of Tweets with + * images processed. + * + * @param predicate a predicate which gates the counter + * @param counter a summingbird Counter instance + */ +case class DataRecordFeatureCounter(predicate: DataRecord => Boolean, counter: Counter) + +object DataRecordFeatureCounter { + + /** + * Increments the counter if the record satisfies the predicate + * + * @param recordCounter a data record counter + * @param record a data record + */ + def apply(recordCounter: DataRecordFeatureCounter, record: DataRecord): Unit = + if (recordCounter.predicate(record)) recordCounter.counter.incr() + + /** + * Defines a feature counter with a predicate that is always true + * + * @param counter a summingbird Counter instance + * @return a data record counter + */ + def any(counter: Counter): DataRecordFeatureCounter = + DataRecordFeatureCounter({ _: DataRecord => true }, counter) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregateFeature.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregateFeature.scala new file mode 100644 index 000000000..4f80490bc --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregateFeature.scala @@ -0,0 +1,51 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.util.Duration +import com.twitter.ml.api._ +import java.lang.{Boolean => JBoolean} + +/** + * Case class used as shared argument for + * getAggregateValue() and setAggregateValue() in AggregationMetric. + * + * @param aggregatePrefix Prefix for aggregate feature name + * @param feature Simple (non-aggregate) feature being aggregated. This + is optional; if None, then the label is aggregated on its own without + being crossed with any feature. + * @param label Label being paired with. This is optional; if None, then + the feature is aggregated on its own without being crossed with any label. + * @param halfLife Half life being used for aggregation + */ +case class AggregateFeature[T]( + aggregatePrefix: String, + feature: Option[Feature[T]], + label: Option[Feature[JBoolean]], + halfLife: Duration) { + val aggregateType = "pair" + val labelName: String = label.map(_.getDenseFeatureName()).getOrElse("any_label") + val featureName: String = feature.map(_.getDenseFeatureName()).getOrElse("any_feature") + + /* + * This val precomputes a portion of the feature name + * for faster processing. String building turns + * out to be a significant bottleneck. + */ + val featurePrefix: String = List( + aggregatePrefix, + aggregateType, + labelName, + featureName, + halfLife.toString + ).mkString(".") +} + +/* Companion object with util methods. */ +object AggregateFeature { + def parseHalfLife(aggregateFeature: Feature[_]): Duration = { + val aggregateComponents = aggregateFeature.getDenseFeatureName().split("\\.") + val numComponents = aggregateComponents.length + val halfLifeStr = aggregateComponents(numComponents - 3) + "." + + aggregateComponents(numComponents - 2) + Duration.parse(halfLifeStr) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetric.scala new file mode 100644 index 000000000..4278c8812 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetric.scala @@ -0,0 +1,184 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.util.Duration +import java.lang.{Long => JLong} + +/** + * Represents an aggregation operator (e.g. count or mean). + * Override all functions in this trait to implement your own metric. + * The operator is parameterized on an input type T, which is the type + * of feature it aggregates, and a TimedValue[A] which is + * the result type of aggregation for this metric. + */ +trait AggregationMetric[T, A] extends FeatureCache[T] { + /* + * Combines two timed aggregate values ''left'' and ''right'' + * with the specified half life ''halfLife'' to produce a result + * TimedValue + * + * @param left Left timed value + * @param right Right timed value + * @param halfLife Half life to use for adding timed values + * @return Result timed value + */ + def plus(left: TimedValue[A], right: TimedValue[A], halfLife: Duration): TimedValue[A] + + /* + * Gets increment value given a datarecord and a feature. + * + * @param dataRecord to get increment value from. + * @param feature Feature to get increment value for. If None, + then the semantics is to just aggregate the label. + * @param timestampFeature Feature to use as millisecond timestamp + for decayed value aggregation. + * @return The incremental contribution to the aggregate of ''feature'' from ''dataRecord''. + * + * For example, if the aggregation metric is count, the incremental + * contribution is always a TimedValue (1.0, time). If the aggregation metric + * is mean, and the feature is a continuous feature (double), the incremental + * contribution looks like a tuple (value, 1.0, time) + */ + def getIncrementValue( + dataRecord: DataRecord, + feature: Option[Feature[T]], + timestampFeature: Feature[JLong] + ): TimedValue[A] + + /* + * The "zero" value for aggregation. + * For example, the zero is 0 for the count operator. + */ + def zero(timeOpt: Option[Long] = None): TimedValue[A] + + /* + * Gets the value of aggregate feature(s) stored in a datarecord, if any. + * Different aggregate operators might store this info in the datarecord + * differently. E.g. count just stores a count, while mean needs to + * store both a sum and a count, and compile them into a TimedValue. We call + * these features stored in the record "output" features. + * + * @param record Record to get value from + * @param query AggregateFeature (see above) specifying details of aggregate + * @param aggregateOutputs An optional precomputed set of aggregation "output" + * feature hashes for this (query, metric) pair. This can be derived from ''query'', + * but we precompute and pass this in for significantly (approximately 4x = 400%) + * faster performance. If not passed in, the operator should reconstruct these features + * from scratch. + * + * @return The aggregate value if found in ''record'', else the appropriate "zero" + for this type of aggregation. + */ + def getAggregateValue( + record: DataRecord, + query: AggregateFeature[T], + aggregateOutputs: Option[List[JLong]] = None + ): TimedValue[A] + + /* + * Sets the value of aggregate feature(s) in a datarecord. Different operators + * will have different representations (see example above). + * + * @param record Record to set value in + * @param query AggregateFeature (see above) specifying details of aggregate + * @param aggregateOutputs An optional precomputed set of aggregation "output" + * features for this (query, metric) pair. This can be derived from ''query'', + * but we precompute and pass this in for significantly (approximately 4x = 400%) + * faster performance. If not passed in, the operator should reconstruct these features + * from scratch. + * + * @param value Value to set for aggregate feature in the record being passed in via ''query'' + */ + def setAggregateValue( + record: DataRecord, + query: AggregateFeature[T], + aggregateOutputs: Option[List[JLong]] = None, + value: TimedValue[A] + ): Unit + + /** + * Get features used to store aggregate output representation + * in partially aggregated data records. + * + * @query AggregateFeature (see above) specifying details of aggregate + * @return A list of "output" features used by this metric to store + * output representation. For example, for the "count" operator, we + * have only one element in this list, which is the result "count" feature. + * For the "mean" operator, we have three elements in this list: the "count" + * feature, the "sum" feature and the "mean" feature. + */ + def getOutputFeatures(query: AggregateFeature[T]): List[Feature[_]] + + /** + * Get feature hashes used to store aggregate output representation + * in partially aggregated data records. + * + * @query AggregateFeature (see above) specifying details of aggregate + * @return A list of "output" feature hashes used by this metric to store + * output representation. For example, for the "count" operator, we + * have only one element in this list, which is the result "count" feature. + * For the "mean" operator, we have three elements in this list: the "count" + * feature, the "sum" feature and the "mean" feature. + */ + def getOutputFeatureIds(query: AggregateFeature[T]): List[JLong] = + getOutputFeatures(query) + .map(_.getDenseFeatureId().asInstanceOf[JLong]) + + /* + * Sums the given feature in two datarecords into a result record + * WARNING: this method has side-effects; it modifies combined + * + * @param combined Result datarecord to mutate and store addition result in + * @param left Left datarecord to add + * @param right Right datarecord to add + * @param query Details of aggregate to add + * @param aggregateOutputs An optional precomputed set of aggregation "output" + * feature hashes for this (query, metric) pair. This can be derived from ''query'', + * but we precompute and pass this in for significantly (approximately 4x = 400%) + * faster performance. If not passed in, the operator should reconstruct these features + * from scratch. + */ + def mutatePlus( + combined: DataRecord, + left: DataRecord, + right: DataRecord, + query: AggregateFeature[T], + aggregateOutputs: Option[List[JLong]] = None + ): Unit = { + val leftValue = getAggregateValue(left, query, aggregateOutputs) + val rightValue = getAggregateValue(right, query, aggregateOutputs) + val combinedValue = plus(leftValue, rightValue, query.halfLife) + setAggregateValue(combined, query, aggregateOutputs, combinedValue) + } + + /** + * Helper function to get increment value from an input DataRecord + * and copy it to an output DataRecord, given an AggregateFeature query spec. + * + * @param output Datarecord to output increment to (will be mutated by this method) + * @param input Datarecord to get increment from + * @param query Details of aggregation + * @param aggregateOutputs An optional precomputed set of aggregation "output" + * feature hashes for this (query, metric) pair. This can be derived from ''query'', + * but we precompute and pass this in for significantly (approximately 4x = 400%) + * faster performance. If not passed in, the operator should reconstruct these features + * from scratch. + * @return True if an increment was set in the output record, else false + */ + def setIncrement( + output: DataRecord, + input: DataRecord, + query: AggregateFeature[T], + timestampFeature: Feature[JLong] = SharedFeatures.TIMESTAMP, + aggregateOutputs: Option[List[JLong]] = None + ): Boolean = { + if (query.label == None || + (query.label.isDefined && SRichDataRecord(input).hasFeature(query.label.get))) { + val incrementValue: TimedValue[A] = getIncrementValue(input, query.feature, timestampFeature) + setAggregateValue(output, query, aggregateOutputs, incrementValue) + true + } else false + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetricCommon.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetricCommon.scala new file mode 100644 index 000000000..e7b97e07b --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/AggregationMetricCommon.scala @@ -0,0 +1,55 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.algebird.DecayedValue +import com.twitter.algebird.DecayedValueMonoid +import com.twitter.algebird.Monoid +import com.twitter.dal.personal_data.thriftjava.PersonalDataType +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.util.Duration +import java.lang.{Long => JLong} +import java.util.{HashSet => JHashSet} +import java.util.{Set => JSet} + +object AggregationMetricCommon { + /* Shared definitions and utils that can be reused by child classes */ + val Epsilon: Double = 1e-6 + val decayedValueMonoid: Monoid[DecayedValue] = DecayedValueMonoid(Epsilon) + val TimestampHash: JLong = SharedFeatures.TIMESTAMP.getDenseFeatureId() + + def toDecayedValue(tv: TimedValue[Double], halfLife: Duration): DecayedValue = { + DecayedValue.build( + tv.value, + tv.timestamp.inMilliseconds, + halfLife.inMilliseconds + ) + } + + def getTimestamp( + record: DataRecord, + timestampFeature: Feature[JLong] = SharedFeatures.TIMESTAMP + ): Long = { + Option( + SRichDataRecord(record) + .getFeatureValue(timestampFeature) + ).map(_.toLong) + .getOrElse(0L) + } + + /* + * Union the PDTs of the input featureOpts. + * Return null if empty, else the JSet[PersonalDataType] + */ + def derivePersonalDataTypes(features: Option[Feature[_]]*): JSet[PersonalDataType] = { + val unionPersonalDataTypes = new JHashSet[PersonalDataType]() + for { + featureOpt <- features + feature <- featureOpt + pdtSetOptional = feature.getPersonalDataTypes + if pdtSetOptional.isPresent + pdtSet = pdtSetOptional.get + } unionPersonalDataTypes.addAll(pdtSet) + if (unionPersonalDataTypes.isEmpty) null else unionPersonalDataTypes + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/BUILD b/timelines/data_processing/ml_util/aggregation_framework/metrics/BUILD new file mode 100644 index 000000000..676b31d81 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/BUILD @@ -0,0 +1,15 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/algebird:core", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/ml/api/util:datarecord", + "src/thrift/com/twitter/dal/personal_data:personal_data-java", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "util/util-core:scala", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/ConversionUtils.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/ConversionUtils.scala new file mode 100644 index 000000000..b04263ea0 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/ConversionUtils.scala @@ -0,0 +1,5 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +object ConversionUtils { + def booleanToDouble(value: Boolean): Double = if (value) 1.0 else 0.0 +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/CountMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/CountMetric.scala new file mode 100644 index 000000000..720fa68e5 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/CountMetric.scala @@ -0,0 +1,41 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.util.Time +import java.lang.{Long => JLong} + +case class TypedCountMetric[T]( +) extends TypedSumLikeMetric[T] { + import AggregationMetricCommon._ + import ConversionUtils._ + override val operatorName = "count" + + override def getIncrementValue( + record: DataRecord, + feature: Option[Feature[T]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = { + val featureExists: Boolean = feature match { + case Some(f) => SRichDataRecord(record).hasFeature(f) + case None => true + } + + TimedValue[Double]( + value = booleanToDouble(featureExists), + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } +} + +/** + * Syntactic sugar for the count metric that works with + * any feature type as opposed to being tied to a specific type. + * See EasyMetric.scala for more details on why this is useful. + */ +object CountMetric extends EasyMetric { + override def forFeatureType[T]( + featureType: FeatureType, + ): Option[AggregationMetric[T, _]] = + Some(TypedCountMetric[T]()) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/EasyMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/EasyMetric.scala new file mode 100644 index 000000000..67edce7ce --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/EasyMetric.scala @@ -0,0 +1,34 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ + +/** + * A "human-readable" metric that can be applied to features of multiple + * different types. Wrapper around AggregationMetric used as syntactic sugar + * for easier config. + */ +trait EasyMetric extends Serializable { + /* + * Given a feature type, fetches the corrrect underlying AggregationMetric + * to perform this operation over the given feature type, if any. If no such + * metric is available, returns None. For example, MEAN cannot be applied + * to FeatureType.String and would return None. + * + * @param featureType Type of feature to fetch metric for + * @param useFixedDecay Param to control whether the metric should use fixed decay + * logic (if appropriate) + * @return Strongly typed aggregation metric to use for this feature type + * + * For example, if the EasyMetric is MEAN and the featureType is + * FeatureType.Continuous, the underlying AggregationMetric should be a + * scalar mean. If the EasyMetric is MEAN and the featureType is + * FeatureType.SparseContinuous, the AggregationMetric returned could be a + * "vector" mean that averages sparse maps. Using the single logical name + * MEAN for both is nice syntactic sugar making for an easier to read top + * level config, though different underlying operators are used underneath + * for the actual implementation. + */ + def forFeatureType[T]( + featureType: FeatureType, + ): Option[AggregationMetric[T, _]] +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/FeatureCache.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/FeatureCache.scala new file mode 100644 index 000000000..e5f384100 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/FeatureCache.scala @@ -0,0 +1,72 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import scala.collection.mutable + +trait FeatureCache[T] { + /* + * Constructs feature names from scratch given an aggregate query and an output + * feature name. E.g. given mean operator and "sum". This function is slow and should + * only be called at pre-computation time. + * + * @param query Details of aggregate feature + * @name Name of "output" feature for which we want to construct feature name + * @return Full name of output feature + */ + private def uncachedFullFeatureName(query: AggregateFeature[T], name: String): String = + List(query.featurePrefix, name).mkString(".") + + /* + * A cache from (aggregate query, output feature name) -> fully qualified feature name + * lazy since it doesn't need to be serialized to the mappers + */ + private lazy val featureNameCache = mutable.Map[(AggregateFeature[T], String), String]() + + /* + * A cache from (aggregate query, output feature name) -> precomputed output feature + * lazy since it doesn't need to be serialized to the mappers + */ + private lazy val featureCache = mutable.Map[(AggregateFeature[T], String), Feature[_]]() + + /** + * Given an (aggregate query, output feature name, output feature type), + * look it up using featureNameCache and featureCache, falling back to uncachedFullFeatureName() + * as a last resort to construct a precomputed output feature. Should only be + * called at pre-computation time. + * + * @param query Details of aggregate feature + * @name Name of "output" feature we want to precompute + * @aggregateFeatureType type of "output" feature we want to precompute + */ + def cachedFullFeature( + query: AggregateFeature[T], + name: String, + aggregateFeatureType: FeatureType + ): Feature[_] = { + lazy val cachedFeatureName = featureNameCache.getOrElseUpdate( + (query, name), + uncachedFullFeatureName(query, name) + ) + + def uncachedFullFeature(): Feature[_] = { + val personalDataTypes = + AggregationMetricCommon.derivePersonalDataTypes(query.feature, query.label) + + aggregateFeatureType match { + case FeatureType.BINARY => new Feature.Binary(cachedFeatureName, personalDataTypes) + case FeatureType.DISCRETE => new Feature.Discrete(cachedFeatureName, personalDataTypes) + case FeatureType.STRING => new Feature.Text(cachedFeatureName, personalDataTypes) + case FeatureType.CONTINUOUS => new Feature.Continuous(cachedFeatureName, personalDataTypes) + case FeatureType.SPARSE_BINARY => + new Feature.SparseBinary(cachedFeatureName, personalDataTypes) + case FeatureType.SPARSE_CONTINUOUS => + new Feature.SparseContinuous(cachedFeatureName, personalDataTypes) + } + } + + featureCache.getOrElseUpdate( + (query, name), + uncachedFullFeature() + ) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/LastResetMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/LastResetMetric.scala new file mode 100644 index 000000000..67fe444aa --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/LastResetMetric.scala @@ -0,0 +1,107 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import java.lang.{Long => JLong} +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.ConversionUtils._ +import com.twitter.util.Duration +import com.twitter.util.Time +import scala.math.max + +/** + * This metric measures how recently an action has taken place. A value of 1.0 + * indicates the action happened just now. This value decays with time if the + * action has not taken place and is reset to 1 when the action happens. So lower + * value indicates a stale or older action. + * + * For example consider an action of "user liking a video". The last reset metric + * value changes as follows for a half life of 1 day. + * + * ---------------------------------------------------------------------------- + * day | action | feature value | Description + * ---------------------------------------------------------------------------- + * 1 | user likes the video | 1.0 | Set the value to 1 + * 2 | user does not like video | 0.5 | Decay the value + * 3 | user does not like video | 0.25 | Decay the value + * 4 | user likes the video | 1.0 | Reset the value to 1 + * ----------------------------------------------------------------------------- + * + * @tparam T + */ +case class TypedLastResetMetric[T]() extends TimedValueAggregationMetric[T] { + import AggregationMetricCommon._ + + override val operatorName = "last_reset" + + override def getIncrementValue( + record: DataRecord, + feature: Option[Feature[T]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = { + val featureExists: Boolean = feature match { + case Some(f) => SRichDataRecord(record).hasFeature(f) + case None => true + } + + TimedValue[Double]( + value = booleanToDouble(featureExists), + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } + private def getDecayedValue( + olderTimedValue: TimedValue[Double], + newerTimestamp: Time, + halfLife: Duration + ): Double = { + if (halfLife.inMilliseconds == 0L) { + 0.0 + } else { + val timeDelta = newerTimestamp.inMilliseconds - olderTimedValue.timestamp.inMilliseconds + val resultValue = olderTimedValue.value / math.pow(2.0, timeDelta / halfLife.inMillis) + if (resultValue > AggregationMetricCommon.Epsilon) resultValue else 0.0 + } + } + + override def plus( + left: TimedValue[Double], + right: TimedValue[Double], + halfLife: Duration + ): TimedValue[Double] = { + + val (newerTimedValue, olderTimedValue) = if (left.timestamp > right.timestamp) { + (left, right) + } else { + (right, left) + } + + val optionallyDecayedOlderValue = if (halfLife == Duration.Top) { + // Since we don't want to decay, older value is not changed + olderTimedValue.value + } else { + // Decay older value + getDecayedValue(olderTimedValue, newerTimedValue.timestamp, halfLife) + } + + TimedValue[Double]( + value = max(newerTimedValue.value, optionallyDecayedOlderValue), + timestamp = newerTimedValue.timestamp + ) + } + + override def zero(timeOpt: Option[Long]): TimedValue[Double] = TimedValue[Double]( + value = 0.0, + timestamp = Time.fromMilliseconds(0) + ) +} + +/** + * Syntactic sugar for the last reset metric that works with + * any feature type as opposed to being tied to a specific type. + * See EasyMetric.scala for more details on why this is useful. + */ +object LastResetMetric extends EasyMetric { + override def forFeatureType[T]( + featureType: FeatureType + ): Option[AggregationMetric[T, _]] = + Some(TypedLastResetMetric[T]()) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/LatestMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/LatestMetric.scala new file mode 100644 index 000000000..08bd6483a --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/LatestMetric.scala @@ -0,0 +1,69 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureType +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon.getTimestamp +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetric +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.EasyMetric +import com.twitter.util.Duration +import com.twitter.util.Time +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import java.lang.{Number => JNumber} + +case class TypedLatestMetric[T <: JNumber](defaultValue: Double = 0.0) + extends TimedValueAggregationMetric[T] { + override val operatorName = "latest" + + override def plus( + left: TimedValue[Double], + right: TimedValue[Double], + halfLife: Duration + ): TimedValue[Double] = { + assert( + halfLife.toString == "Duration.Top", + s"halfLife must be Duration.Top when using latest metric, but ${halfLife.toString} is used" + ) + + if (left.timestamp > right.timestamp) { + left + } else { + right + } + } + + override def getIncrementValue( + dataRecord: DataRecord, + feature: Option[Feature[T]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = { + val value = feature + .flatMap(SRichDataRecord(dataRecord).getFeatureValueOpt(_)) + .map(_.doubleValue()).getOrElse(defaultValue) + val timestamp = Time.fromMilliseconds(getTimestamp(dataRecord, timestampFeature)) + TimedValue[Double](value = value, timestamp = timestamp) + } + + override def zero(timeOpt: Option[Long]): TimedValue[Double] = + TimedValue[Double]( + value = 0.0, + timestamp = Time.fromMilliseconds(0) + ) +} + +object LatestMetric extends EasyMetric { + override def forFeatureType[T]( + featureType: FeatureType + ): Option[AggregationMetric[T, _]] = { + featureType match { + case FeatureType.CONTINUOUS => + Some(TypedLatestMetric[JDouble]().asInstanceOf[AggregationMetric[T, Double]]) + case FeatureType.DISCRETE => + Some(TypedLatestMetric[JLong]().asInstanceOf[AggregationMetric[T, Double]]) + case _ => None + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/MaxMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/MaxMetric.scala new file mode 100644 index 000000000..b9e9176bb --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/MaxMetric.scala @@ -0,0 +1,64 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon.getTimestamp +import com.twitter.util.Duration +import com.twitter.util.Time +import java.lang.{Long => JLong} +import java.lang.{Number => JNumber} +import java.lang.{Double => JDouble} +import scala.math.max + +case class TypedMaxMetric[T <: JNumber](defaultValue: Double = 0.0) + extends TimedValueAggregationMetric[T] { + override val operatorName = "max" + + override def getIncrementValue( + dataRecord: DataRecord, + feature: Option[Feature[T]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = { + val value = feature + .flatMap(SRichDataRecord(dataRecord).getFeatureValueOpt(_)) + .map(_.doubleValue()).getOrElse(defaultValue) + val timestamp = Time.fromMilliseconds(getTimestamp(dataRecord, timestampFeature)) + TimedValue[Double](value = value, timestamp = timestamp) + } + + override def plus( + left: TimedValue[Double], + right: TimedValue[Double], + halfLife: Duration + ): TimedValue[Double] = { + + assert( + halfLife.toString == "Duration.Top", + s"halfLife must be Duration.Top when using max metric, but ${halfLife.toString} is used" + ) + + TimedValue[Double]( + value = max(left.value, right.value), + timestamp = left.timestamp.max(right.timestamp) + ) + } + + override def zero(timeOpt: Option[Long]): TimedValue[Double] = + TimedValue[Double]( + value = 0.0, + timestamp = Time.fromMilliseconds(0) + ) +} + +object MaxMetric extends EasyMetric { + def forFeatureType[T]( + featureType: FeatureType, + ): Option[AggregationMetric[T, _]] = + featureType match { + case FeatureType.CONTINUOUS => + Some(TypedMaxMetric[JDouble]().asInstanceOf[AggregationMetric[T, Double]]) + case FeatureType.DISCRETE => + Some(TypedMaxMetric[JLong]().asInstanceOf[AggregationMetric[T, Double]]) + case _ => None + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/SumLikeMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumLikeMetric.scala new file mode 100644 index 000000000..1f7aeb58a --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumLikeMetric.scala @@ -0,0 +1,66 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.util.Duration +import com.twitter.util.Time +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import java.util.{Map => JMap} + +/* + * TypedSumLikeMetric aggregates a sum over any feature transform. + * TypedCountMetric, TypedSumMetric, TypedSumSqMetric are examples + * of metrics that are inherited from this trait. To implement a new + * "sum like" metric, override the getIncrementValue() and operatorName + * members of this trait. + * + * getIncrementValue() is inherited from the + * parent trait AggregationMetric, but not overriden in this trait, so + * it needs to be overloaded by any metric that extends TypedSumLikeMetric. + * + * operatorName is a string used for naming the resultant aggregate feature + * (e.g. "count" if its a count feature, or "sum" if a sum feature). + */ +trait TypedSumLikeMetric[T] extends TimedValueAggregationMetric[T] { + import AggregationMetricCommon._ + + def useFixedDecay = true + + override def plus( + left: TimedValue[Double], + right: TimedValue[Double], + halfLife: Duration + ): TimedValue[Double] = { + val resultValue = if (halfLife == Duration.Top) { + /* We could use decayedValueMonoid here, but + * a simple addition is slightly more accurate */ + left.value + right.value + } else { + val decayedLeft = toDecayedValue(left, halfLife) + val decayedRight = toDecayedValue(right, halfLife) + decayedValueMonoid.plus(decayedLeft, decayedRight).value + } + + TimedValue[Double]( + resultValue, + left.timestamp.max(right.timestamp) + ) + } + + override def zero(timeOpt: Option[Long]): TimedValue[Double] = { + val timestamp = + /* + * Please see TQ-11279 for documentation for this fix to the decay logic. + */ + if (useFixedDecay) { + Time.fromMilliseconds(timeOpt.getOrElse(0L)) + } else { + Time.fromMilliseconds(0L) + } + + TimedValue[Double]( + value = 0.0, + timestamp = timestamp + ) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/SumMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumMetric.scala new file mode 100644 index 000000000..bd93d5bae --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumMetric.scala @@ -0,0 +1,52 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.util.Time +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} + +case class TypedSumMetric( +) extends TypedSumLikeMetric[JDouble] { + import AggregationMetricCommon._ + + override val operatorName = "sum" + + /* + * Transform feature -> its value in the given record, + * or 0 when feature = None (sum has no meaning in this case) + */ + override def getIncrementValue( + record: DataRecord, + feature: Option[Feature[JDouble]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = feature match { + case Some(f) => { + TimedValue[Double]( + value = Option(SRichDataRecord(record).getFeatureValue(f)).map(_.toDouble).getOrElse(0.0), + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } + + case None => + TimedValue[Double]( + value = 0.0, + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } +} + +/** + * Syntactic sugar for the sum metric that works with continuous features. + * See EasyMetric.scala for more details on why this is useful. + */ +object SumMetric extends EasyMetric { + override def forFeatureType[T]( + featureType: FeatureType + ): Option[AggregationMetric[T, _]] = + featureType match { + case FeatureType.CONTINUOUS => + Some(TypedSumMetric().asInstanceOf[AggregationMetric[T, Double]]) + case _ => None + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/SumSqMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumSqMetric.scala new file mode 100644 index 000000000..b24b16377 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/SumSqMetric.scala @@ -0,0 +1,53 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.util.Time +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} + +case class TypedSumSqMetric() extends TypedSumLikeMetric[JDouble] { + import AggregationMetricCommon._ + + override val operatorName = "sumsq" + + /* + * Transform feature -> its squared value in the given record + * or 0 when feature = None (sumsq has no meaning in this case) + */ + override def getIncrementValue( + record: DataRecord, + feature: Option[Feature[JDouble]], + timestampFeature: Feature[JLong] + ): TimedValue[Double] = feature match { + case Some(f) => { + val featureVal = + Option(SRichDataRecord(record).getFeatureValue(f)).map(_.toDouble).getOrElse(0.0) + TimedValue[Double]( + value = featureVal * featureVal, + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } + + case None => + TimedValue[Double]( + value = 0.0, + timestamp = Time.fromMilliseconds(getTimestamp(record, timestampFeature)) + ) + } +} + +/** + * Syntactic sugar for the sum of squares metric that works with continuous features. + * See EasyMetric.scala for more details on why this is useful. + */ +object SumSqMetric extends EasyMetric { + override def forFeatureType[T]( + featureType: FeatureType + ): Option[AggregationMetric[T, _]] = + featureType match { + case FeatureType.CONTINUOUS => + Some(TypedSumSqMetric().asInstanceOf[AggregationMetric[T, Double]]) + case _ => None + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValue.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValue.scala new file mode 100644 index 000000000..7f9fb5090 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValue.scala @@ -0,0 +1,14 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.util.Time + +/** + * Case class wrapping a (value, timestamp) tuple. + * All aggregate metrics must operate over this class + * to ensure we can implement decay and half lives for them. + * This is translated to an algebird DecayedValue under the hood. + * + * @param value Value being wrapped + * @param timestamp Time after epoch at which value is being measured + */ +case class TimedValue[T](value: T, timestamp: Time) diff --git a/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValueAggregationMetric.scala b/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValueAggregationMetric.scala new file mode 100644 index 000000000..f31152a23 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/metrics/TimedValueAggregationMetric.scala @@ -0,0 +1,90 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics + +import com.twitter.ml.api._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregateFeature +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.TimedValue +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetric +import com.twitter.util.Duration +import com.twitter.util.Time +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import java.util.{Map => JMap} + +/* + * ContinuousAggregationMetric overrides method AggregationMetric dealing + * with reading and writing continuous values from a data record. + * + * operatorName is a string used for naming the resultant aggregate feature + * (e.g. "count" if its a count feature, or "sum" if a sum feature). + */ +trait TimedValueAggregationMetric[T] extends AggregationMetric[T, Double] { + import AggregationMetricCommon._ + + val operatorName: String + + override def getAggregateValue( + record: DataRecord, + query: AggregateFeature[T], + aggregateOutputs: Option[List[JLong]] = None + ): TimedValue[Double] = { + /* + * We know aggregateOutputs(0) will have the continuous feature, + * since we put it there in getOutputFeatureIds() - see code below. + * This helps us get a 4x speedup. Using any structure more complex + * than a list was also a performance bottleneck. + */ + val featureHash: JLong = aggregateOutputs + .getOrElse(getOutputFeatureIds(query)) + .head + + val continuousValueOption: Option[Double] = Option(record.continuousFeatures) + .flatMap { case jmap: JMap[JLong, JDouble] => Option(jmap.get(featureHash)) } + .map(_.toDouble) + + val timeOption = Option(record.discreteFeatures) + .flatMap { case jmap: JMap[JLong, JLong] => Option(jmap.get(TimestampHash)) } + .map(_.toLong) + + val resultOption: Option[TimedValue[Double]] = (continuousValueOption, timeOption) match { + case (Some(featureValue), Some(timesamp)) => + Some(TimedValue[Double](featureValue, Time.fromMilliseconds(timesamp))) + case _ => None + } + + resultOption.getOrElse(zero(timeOption)) + } + + override def setAggregateValue( + record: DataRecord, + query: AggregateFeature[T], + aggregateOutputs: Option[List[JLong]] = None, + value: TimedValue[Double] + ): Unit = { + /* + * We know aggregateOutputs(0) will have the continuous feature, + * since we put it there in getOutputFeatureIds() - see code below. + * This helps us get a 4x speedup. Using any structure more complex + * than a list was also a performance bottleneck. + */ + val featureHash: JLong = aggregateOutputs + .getOrElse(getOutputFeatureIds(query)) + .head + + /* Only set value if non-zero to save space */ + if (value.value != 0.0) { + record.putToContinuousFeatures(featureHash, value.value) + } + + /* + * We do not set timestamp since that might affect correctness of + * future aggregations due to the decay semantics. + */ + } + + /* Only one feature stored in the aggregated datarecord: the result continuous value */ + override def getOutputFeatures(query: AggregateFeature[T]): List[Feature[_]] = { + val feature = cachedFullFeature(query, operatorName, FeatureType.CONTINUOUS) + List(feature) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/package.scala b/timelines/data_processing/ml_util/aggregation_framework/package.scala new file mode 100644 index 000000000..824398a7f --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/package.scala @@ -0,0 +1,19 @@ +package com.twitter.timelines.data_processing.ml_util + +import com.twitter.ml.api.DataRecord + +package object aggregation_framework { + object AggregateType extends Enumeration { + type AggregateType = Value + val User, UserAuthor, UserEngager, UserMention, UserRequestHour, UserRequestDow, + UserOriginalAuthor, UserList, UserTopic, UserInferredTopic, UserMediaUnderstandingAnnotation = + Value + } + + type AggregateUserEntityKey = (Long, AggregateType.Value, Option[Long]) + + case class MergedRecordsDescriptor( + userId: Long, + keyedRecords: Map[AggregateType.Value, Option[KeyedRecord]], + keyedRecordMaps: Map[AggregateType.Value, Option[KeyedRecordMap]]) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/query/BUILD b/timelines/data_processing/ml_util/aggregation_framework/query/BUILD new file mode 100644 index 000000000..97e6d1ea7 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/query/BUILD @@ -0,0 +1,12 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "finagle/finagle-stats", + "src/java/com/twitter/ml/api:api-base", + "src/thrift/com/twitter/ml/api:data-scala", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "timelines/data_processing/ml_util/aggregation_framework/metrics", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/query/ScopedAggregateBuilder.scala b/timelines/data_processing/ml_util/aggregation_framework/query/ScopedAggregateBuilder.scala new file mode 100644 index 000000000..2fcce3312 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/query/ScopedAggregateBuilder.scala @@ -0,0 +1,159 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.query + +import com.twitter.dal.personal_data.thriftjava.PersonalDataType +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.ml.api.FeatureBuilder +import com.twitter.ml.api.FeatureContext +import com.twitter.ml.api.thriftscala.{DataRecord => ScalaDataRecord} +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics.AggregationMetricCommon +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import scala.collection.JavaConverters._ + +/** + * Provides methods to build "scoped" aggregates, where base features generated by aggregates + * V2 are scoped with a specific key. + * + * The class provides methods that take a Map of T -> DataRecord, where T is a key type, and + * the DataRecord contains features produced by the aggregation_framework. The methods then + * generate a _new_ DataRecord, containing "scoped" aggregate features, where each scoped + * feature has the value of the scope key in the feature name, and the value of the feature + * is the value of the original aggregate feature in the corresponding value from the original + * Map. + * + * For efficiency reasons, the builder is initialized with the set of features that should be + * scoped and the set of keys for which scoping should be supported. + * + * To understand how scope feature names are constructed, consider the following: + * + * {{{ + * val features = Set( + * new Feature.Continuous("user_injection_aggregate.pair.any_label.any_feature.5.days.count"), + * new Feature.Continuous("user_injection_aggregate.pair.any_label.any_feature.10.days.count") + * ) + * val scopes = Set(SuggestType.Recap, SuggestType.WhoToFollow) + * val scopeName = "InjectionType" + * val scopedAggregateBuilder = ScopedAggregateBuilder(features, scopes, scopeName) + * + * }}} + * + * Then, generated scoped features would be among the following: + * - user_injection_aggregate.scoped.pair.any_label.any_feature.5.days.count/scope_name=InjectionType/scope=Recap + * - user_injection_aggregate.scoped.pair.any_label.any_feature.5.days.count/scope_name=InjectionType/scope=WhoToFollow + * - user_injection_aggregate.scoped.pair.any_label.any_feature.10.days.count/scope_name=InjectionType/scope=Recap + * - user_injection_aggregate.scoped.pair.any_label.any_feature.10.days.count/scope_name=InjectionType/scope=WhoToFollow + * + * @param featuresToScope the set of features for which one should generate scoped versions + * @param scopeKeys the set of scope keys to generate scopes with + * @param scopeName a string indicating what the scopes represent. This is also added to the scoped feature + * @tparam K the type of scope key + */ +class ScopedAggregateBuilder[K]( + featuresToScope: Set[Feature[JDouble]], + scopeKeys: Set[K], + scopeName: String) { + + private[this] def buildScopedAggregateFeature( + baseName: String, + scopeValue: String, + personalDataTypes: java.util.Set[PersonalDataType] + ): Feature[JDouble] = { + val components = baseName.split("\\.").toList + + val newName = (components.head :: "scoped" :: components.tail).mkString(".") + + new FeatureBuilder.Continuous() + .addExtensionDimensions("scope_name", "scope") + .setBaseName(newName) + .setPersonalDataTypes(personalDataTypes) + .extensionBuilder() + .addExtension("scope_name", scopeName) + .addExtension("scope", scopeValue) + .build() + } + + /** + * Index of (base aggregate feature name, key) -> key scoped count feature. + */ + private[this] val keyScopedAggregateMap: Map[(String, K), Feature[JDouble]] = { + featuresToScope.flatMap { feat => + scopeKeys.map { key => + (feat.getFeatureName, key) -> + buildScopedAggregateFeature( + feat.getFeatureName, + key.toString, + AggregationMetricCommon.derivePersonalDataTypes(Some(feat)) + ) + } + }.toMap + } + + type ContinuousFeaturesMap = Map[JLong, JDouble] + + /** + * Create key-scoped features for raw aggregate feature ID to value maps, partitioned by key. + */ + private[this] def buildAggregates(featureMapsByKey: Map[K, ContinuousFeaturesMap]): DataRecord = { + val continuousFeatures = featureMapsByKey + .flatMap { + case (key, featureMap) => + featuresToScope.flatMap { feature => + val newFeatureOpt = keyScopedAggregateMap.get((feature.getFeatureName, key)) + newFeatureOpt.flatMap { newFeature => + featureMap.get(feature.getFeatureId).map(new JLong(newFeature.getFeatureId) -> _) + } + }.toMap + } + + new DataRecord().setContinuousFeatures(continuousFeatures.asJava) + } + + /** + * Create key-scoped features for Java [[DataRecord]] aggregate records partitioned by key. + * + * As an example, if the provided Map includes the key `SuggestType.Recap`, and [[scopeKeys]] + * includes this key, then for a feature "xyz.pair.any_label.any_feature.5.days.count", the method + * will generate the scoped feature "xyz.scoped.pair.any_label.any_feature.5.days.count/scope_name=InjectionType/scope=Recap", + * with the value being the value of the original feature from the Map. + * + * @param aggregatesByKey a map from key to a continuous feature map (ie. feature ID -> Double) + * @return a Java [[DataRecord]] containing key-scoped features + */ + def buildAggregatesJava(aggregatesByKey: Map[K, DataRecord]): DataRecord = { + val featureMapsByKey = aggregatesByKey.mapValues(_.continuousFeatures.asScala.toMap) + buildAggregates(featureMapsByKey) + } + + /** + * Create key-scoped features for Scala [[DataRecord]] aggregate records partitioned by key. + * + * As an example, if the provided Map includes the key `SuggestType.Recap`, and [[scopeKeys]] + * includes this key, then for a feature "xyz.pair.any_label.any_feature.5.days.count", the method + * will generate the scoped feature "xyz.scoped.pair.any_label.any_feature.5.days.count/scope_name=InjectionType/scope=Recap", + * with the value being the value of the original feature from the Map. + * + * This is a convenience method for some use cases where aggregates are read from Scala + * thrift objects. Note that this still returns a Java [[DataRecord]], since most ML API + * use the Java version. + * + * @param aggregatesByKey a map from key to a continuous feature map (ie. feature ID -> Double) + * @return a Java [[DataRecord]] containing key-scoped features + */ + def buildAggregatesScala(aggregatesByKey: Map[K, ScalaDataRecord]): DataRecord = { + val featureMapsByKey = + aggregatesByKey + .mapValues { record => + val featureMap = record.continuousFeatures.getOrElse(Map[Long, Double]()).toMap + featureMap.map { case (k, v) => new JLong(k) -> new JDouble(v) } + } + buildAggregates(featureMapsByKey) + } + + /** + * Returns a [[FeatureContext]] including all possible scoped features generated using this builder. + * + * @return a [[FeatureContext]] containing all scoped features. + */ + def scopedFeatureContext: FeatureContext = new FeatureContext(keyScopedAggregateMap.values.asJava) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregateFeaturesMerger.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregateFeaturesMerger.scala new file mode 100644 index 000000000..156168a9d --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregateFeaturesMerger.scala @@ -0,0 +1,213 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures._ +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding.Stat +import com.twitter.scalding.typed.TypedPipe +import com.twitter.timelines.data_processing.ml_util.aggregation_framework._ +import com.twitter.timelines.data_processing.ml_util.sampling.SamplingUtils + +trait AggregateFeaturesMergerBase { + import Utils._ + + def samplingRateOpt: Option[Double] + def numReducers: Int = 2000 + def numReducersMerge: Int = 20000 + + def aggregationConfig: AggregationConfig + def storeRegister: StoreRegister + def storeMerger: StoreMerger + + def getAggregatePipe(storeName: String): DataSetPipe + def applyMaxSizeByTypeOpt(aggregateType: AggregateType.Value): Option[Int] = Option.empty[Int] + + def usersActiveSourcePipe: TypedPipe[Long] + def numRecords: Stat + def numFilteredRecords: Stat + + /* + * This method should only be called with a storeName that corresponds + * to a user aggregate store. + */ + def extractUserFeaturesMap(storeName: String): TypedPipe[(Long, KeyedRecord)] = { + val aggregateKey = storeRegister.storeNameToTypeMap(storeName) + samplingRateOpt + .map(rate => SamplingUtils.userBasedSample(getAggregatePipe(storeName), rate)) + .getOrElse(getAggregatePipe(storeName)) // must return store with only user aggregates + .records + .map { r: DataRecord => + val record = SRichDataRecord(r) + val userId = record.getFeatureValue(USER_ID).longValue + record.clearFeature(USER_ID) + (userId, KeyedRecord(aggregateKey, r)) + } + } + + /* + * When the secondaryKey being used is a String, then the shouldHash function should be set to true. + * Refactor such that the shouldHash parameter is removed and the behavior + * is defaulted to true. + * + * This method should only be called with a storeName that contains records with the + * desired secondaryKey. We provide secondaryKeyFilterPipeOpt against which secondary + * keys can be filtered to help prune the final merged MH dataset. + */ + def extractSecondaryTuples[T]( + storeName: String, + secondaryKey: Feature[T], + shouldHash: Boolean = false, + maxSizeOpt: Option[Int] = None, + secondaryKeyFilterPipeOpt: Option[TypedPipe[Long]] = None + ): TypedPipe[(Long, KeyedRecordMap)] = { + val aggregateKey = storeRegister.storeNameToTypeMap(storeName) + + val extractedRecordsBySecondaryKey = + samplingRateOpt + .map(rate => SamplingUtils.userBasedSample(getAggregatePipe(storeName), rate)) + .getOrElse(getAggregatePipe(storeName)) + .records + .map { r: DataRecord => + val record = SRichDataRecord(r) + val userId = keyFromLong(r, USER_ID) + val secondaryId = extractSecondary(r, secondaryKey, shouldHash) + record.clearFeature(USER_ID) + record.clearFeature(secondaryKey) + + numRecords.inc() + (userId, secondaryId -> r) + } + + val grouped = + (secondaryKeyFilterPipeOpt match { + case Some(secondaryKeyFilterPipe: TypedPipe[Long]) => + extractedRecordsBySecondaryKey + .map { + // In this step, we swap `userId` with `secondaryId` to join on the `secondaryId` + // It is important to swap them back after the join, otherwise the job will fail. + case (userId, (secondaryId, r)) => + (secondaryId, (userId, r)) + } + .join(secondaryKeyFilterPipe.groupBy(identity)) + .map { + case (secondaryId, ((userId, r), _)) => + numFilteredRecords.inc() + (userId, secondaryId -> r) + } + case _ => extractedRecordsBySecondaryKey + }).group + .withReducers(numReducers) + + maxSizeOpt match { + case Some(maxSize) => + grouped + .take(maxSize) + .mapValueStream(recordsIter => Iterator(KeyedRecordMap(aggregateKey, recordsIter.toMap))) + .toTypedPipe + case None => + grouped + .mapValueStream(recordsIter => Iterator(KeyedRecordMap(aggregateKey, recordsIter.toMap))) + .toTypedPipe + } + } + + def userPipes: Seq[TypedPipe[(Long, KeyedRecord)]] = + storeRegister.allStores.flatMap { storeConfig => + val StoreConfig(storeNames, aggregateType, _) = storeConfig + require(storeMerger.isValidToMerge(storeNames)) + + if (aggregateType == AggregateType.User) { + storeNames.map(extractUserFeaturesMap) + } else None + }.toSeq + + private def getSecondaryKeyFilterPipeOpt( + aggregateType: AggregateType.Value + ): Option[TypedPipe[Long]] = { + if (aggregateType == AggregateType.UserAuthor) { + Some(usersActiveSourcePipe) + } else None + } + + def userSecondaryKeyPipes: Seq[TypedPipe[(Long, KeyedRecordMap)]] = { + storeRegister.allStores.flatMap { storeConfig => + val StoreConfig(storeNames, aggregateType, shouldHash) = storeConfig + require(storeMerger.isValidToMerge(storeNames)) + + if (aggregateType != AggregateType.User) { + storeNames.flatMap { storeName => + storeConfig.secondaryKeyFeatureOpt + .map { secondaryFeature => + extractSecondaryTuples( + storeName, + secondaryFeature, + shouldHash, + applyMaxSizeByTypeOpt(aggregateType), + getSecondaryKeyFilterPipeOpt(aggregateType) + ) + } + } + } else None + }.toSeq + } + + def joinedAggregates: TypedPipe[(Long, MergedRecordsDescriptor)] = { + (userPipes ++ userSecondaryKeyPipes) + .reduce(_ ++ _) + .group + .withReducers(numReducersMerge) + .mapGroup { + case (uid, keyedRecordsAndMaps) => + /* + * For every user, partition their records by aggregate type. + * AggregateType.User should only contain KeyedRecord whereas + * other aggregate types (with secondary keys) contain KeyedRecordMap. + */ + val (userRecords, userSecondaryKeyRecords) = keyedRecordsAndMaps.toList + .map { record => + record match { + case record: KeyedRecord => (record.aggregateType, record) + case record: KeyedRecordMap => (record.aggregateType, record) + } + } + .groupBy(_._1) + .mapValues(_.map(_._2)) + .partition(_._1 == AggregateType.User) + + val userAggregateRecordMap: Map[AggregateType.Value, Option[KeyedRecord]] = + userRecords + .asInstanceOf[Map[AggregateType.Value, List[KeyedRecord]]] + .map { + case (aggregateType, keyedRecords) => + val mergedKeyedRecordOpt = mergeKeyedRecordOpts(keyedRecords.map(Some(_)): _*) + (aggregateType, mergedKeyedRecordOpt) + } + + val userSecondaryKeyAggregateRecordOpt: Map[AggregateType.Value, Option[KeyedRecordMap]] = + userSecondaryKeyRecords + .asInstanceOf[Map[AggregateType.Value, List[KeyedRecordMap]]] + .map { + case (aggregateType, keyedRecordMaps) => + val keyedRecordMapOpt = + keyedRecordMaps.foldLeft(Option.empty[KeyedRecordMap]) { + (mergedRecOpt, nextRec) => + applyMaxSizeByTypeOpt(aggregateType) + .map { maxSize => + mergeKeyedRecordMapOpts(mergedRecOpt, Some(nextRec), maxSize) + }.getOrElse { + mergeKeyedRecordMapOpts(mergedRecOpt, Some(nextRec)) + } + } + (aggregateType, keyedRecordMapOpt) + } + + Iterator( + MergedRecordsDescriptor( + userId = uid, + keyedRecords = userAggregateRecordMap, + keyedRecordMaps = userSecondaryKeyAggregateRecordOpt + ) + ) + }.toTypedPipe + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesStoreComparisonJob.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesStoreComparisonJob.scala new file mode 100644 index 000000000..054d5d428 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesStoreComparisonJob.scala @@ -0,0 +1,200 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.algebird.ScMapMonoid +import com.twitter.bijection.Injection +import com.twitter.bijection.thrift.CompactThriftCodec +import com.twitter.ml.api.util.CompactDataRecordConverter +import com.twitter.ml.api.CompactDataRecord +import com.twitter.ml.api.DataRecord +import com.twitter.scalding.commons.source.VersionedKeyValSource +import com.twitter.scalding.Args +import com.twitter.scalding.Days +import com.twitter.scalding.Duration +import com.twitter.scalding.RichDate +import com.twitter.scalding.TypedPipe +import com.twitter.scalding.TypedTsv +import com.twitter.scalding_internal.job.HasDateRange +import com.twitter.scalding_internal.job.analytics_batch.AnalyticsBatchJob +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird_internal.bijection.BatchPairImplicits +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKeyInjection +import java.lang.{Double => JDouble} +import java.lang.{Long => JLong} +import scala.collection.JavaConverters._ + +/** + * The job takes four inputs: + * - The path to a AggregateStore using the DataRecord format. + * - The path to a AggregateStore using the CompactDataRecord format. + * - A version that must be present in both sources. + * - A sink to write the comparison statistics. + * + * The job reads in the two stores, converts the second one to DataRecords and + * then compared each key to see if the two stores have identical DataRecords, + * modulo the loss in precision on converting the Double to Float. + */ +class AggregatesStoreComparisonJob(args: Args) + extends AnalyticsBatchJob(args) + with BatchPairImplicits + with HasDateRange { + + import AggregatesStoreComparisonJob._ + override def batchIncrement: Duration = Days(1) + override def firstTime: RichDate = RichDate(args("firstTime")) + + private val dataRecordSourcePath = args("dataRecordSource") + private val compactDataRecordSourcePath = args("compactDataRecordSource") + + private val version = args.long("version") + + private val statsSink = args("sink") + + require(dataRecordSourcePath != compactDataRecordSourcePath) + + private val dataRecordSource = + VersionedKeyValSource[AggregationKey, (BatchID, DataRecord)]( + path = dataRecordSourcePath, + sourceVersion = Some(version) + ) + private val compactDataRecordSource = + VersionedKeyValSource[AggregationKey, (BatchID, CompactDataRecord)]( + path = compactDataRecordSourcePath, + sourceVersion = Some(version) + ) + + private val dataRecordPipe: TypedPipe[((AggregationKey, BatchID), DataRecord)] = TypedPipe + .from(dataRecordSource) + .map { case (key, (batchId, record)) => ((key, batchId), record) } + + private val compactDataRecordPipe: TypedPipe[((AggregationKey, BatchID), DataRecord)] = TypedPipe + .from(compactDataRecordSource) + .map { + case (key, (batchId, compactRecord)) => + val record = compactConverter.compactDataRecordToDataRecord(compactRecord) + ((key, batchId), record) + } + + dataRecordPipe + .outerJoin(compactDataRecordPipe) + .mapValues { case (leftOpt, rightOpt) => compareDataRecords(leftOpt, rightOpt) } + .values + .sum(mapMonoid) + .flatMap(_.toList) + .write(TypedTsv(statsSink)) +} + +object AggregatesStoreComparisonJob { + + val mapMonoid: ScMapMonoid[String, Long] = new ScMapMonoid[String, Long]() + + implicit private val aggregationKeyInjection: Injection[AggregationKey, Array[Byte]] = + AggregationKeyInjection + implicit private val aggregationKeyOrdering: Ordering[AggregationKey] = AggregationKeyOrdering + implicit private val dataRecordCodec: Injection[DataRecord, Array[Byte]] = + CompactThriftCodec[DataRecord] + implicit private val compactDataRecordCodec: Injection[CompactDataRecord, Array[Byte]] = + CompactThriftCodec[CompactDataRecord] + + private val compactConverter = new CompactDataRecordConverter + + val missingRecordFromLeft = "missingRecordFromLeft" + val missingRecordFromRight = "missingRecordFromRight" + val nonContinuousFeaturesDidNotMatch = "nonContinuousFeaturesDidNotMatch" + val missingFeaturesFromLeft = "missingFeaturesFromLeft" + val missingFeaturesFromRight = "missingFeaturesFromRight" + val recordsWithUnmatchedKeys = "recordsWithUnmatchedKeys" + val featureValuesMatched = "featureValuesMatched" + val featureValuesThatDidNotMatch = "featureValuesThatDidNotMatch" + val equalRecords = "equalRecords" + val keyCount = "keyCount" + + def compareDataRecords( + leftOpt: Option[DataRecord], + rightOpt: Option[DataRecord] + ): collection.Map[String, Long] = { + val stats = collection.Map((keyCount, 1L)) + (leftOpt, rightOpt) match { + case (Some(left), Some(right)) => + if (isIdenticalNonContinuousFeatureSet(left, right)) { + getContinuousFeaturesStats(left, right).foldLeft(stats)(mapMonoid.add) + } else { + mapMonoid.add(stats, (nonContinuousFeaturesDidNotMatch, 1L)) + } + case (Some(_), None) => mapMonoid.add(stats, (missingRecordFromRight, 1L)) + case (None, Some(_)) => mapMonoid.add(stats, (missingRecordFromLeft, 1L)) + case (None, None) => throw new IllegalArgumentException("Should never be possible") + } + } + + /** + * For Continuous features. + */ + private def getContinuousFeaturesStats( + left: DataRecord, + right: DataRecord + ): Seq[(String, Long)] = { + val leftFeatures = Option(left.getContinuousFeatures) + .map(_.asScala.toMap) + .getOrElse(Map.empty[JLong, JDouble]) + + val rightFeatures = Option(right.getContinuousFeatures) + .map(_.asScala.toMap) + .getOrElse(Map.empty[JLong, JDouble]) + + val numMissingFeaturesLeft = (rightFeatures.keySet diff leftFeatures.keySet).size + val numMissingFeaturesRight = (leftFeatures.keySet diff rightFeatures.keySet).size + + if (numMissingFeaturesLeft == 0 && numMissingFeaturesRight == 0) { + val Epsilon = 1e-5 + val numUnmatchedValues = leftFeatures.map { + case (id, lValue) => + val rValue = rightFeatures(id) + // The approximate match is to account for the precision loss due to + // the Double -> Float -> Double conversion. + if (math.abs(lValue - rValue) <= Epsilon) 0L else 1L + }.sum + + if (numUnmatchedValues == 0) { + Seq( + (equalRecords, 1L), + (featureValuesMatched, leftFeatures.size.toLong) + ) + } else { + Seq( + (featureValuesThatDidNotMatch, numUnmatchedValues), + ( + featureValuesMatched, + math.max(leftFeatures.size, rightFeatures.size) - numUnmatchedValues) + ) + } + } else { + Seq( + (recordsWithUnmatchedKeys, 1L), + (missingFeaturesFromLeft, numMissingFeaturesLeft.toLong), + (missingFeaturesFromRight, numMissingFeaturesRight.toLong) + ) + } + } + + /** + * For feature types that are not Feature.Continuous. We expect these to match exactly in the two stores. + * Mutable change + */ + private def isIdenticalNonContinuousFeatureSet(left: DataRecord, right: DataRecord): Boolean = { + val booleanMatched = safeEquals(left.binaryFeatures, right.binaryFeatures) + val discreteMatched = safeEquals(left.discreteFeatures, right.discreteFeatures) + val stringMatched = safeEquals(left.stringFeatures, right.stringFeatures) + val sparseBinaryMatched = safeEquals(left.sparseBinaryFeatures, right.sparseBinaryFeatures) + val sparseContinuousMatched = + safeEquals(left.sparseContinuousFeatures, right.sparseContinuousFeatures) + val blobMatched = safeEquals(left.blobFeatures, right.blobFeatures) + val tensorsMatched = safeEquals(left.tensors, right.tensors) + val sparseTensorsMatched = safeEquals(left.sparseTensors, right.sparseTensors) + + booleanMatched && discreteMatched && stringMatched && sparseBinaryMatched && + sparseContinuousMatched && blobMatched && tensorsMatched && sparseTensorsMatched + } + + def safeEquals[T](l: T, r: T): Boolean = Option(l).equals(Option(r)) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesV2ScaldingJob.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesV2ScaldingJob.scala new file mode 100644 index 000000000..aa8ae3612 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregatesV2ScaldingJob.scala @@ -0,0 +1,216 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.bijection.thrift.CompactThriftCodec +import com.twitter.bijection.Codec +import com.twitter.bijection.Injection +import com.twitter.ml.api._ +import com.twitter.ml.api.constant.SharedFeatures.TIMESTAMP +import com.twitter.ml.api.util.CompactDataRecordConverter +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding.Args +import com.twitter.scalding_internal.dalv2.DALWrite.D +import com.twitter.storehaus_internal.manhattan.ManhattanROConfig +import com.twitter.summingbird.batch.option.Reducers +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird.batch.Batcher +import com.twitter.summingbird.batch.Timestamp +import com.twitter.summingbird.option._ +import com.twitter.summingbird.scalding.Scalding +import com.twitter.summingbird.scalding.batch.{BatchedStore => ScaldingBatchedStore} +import com.twitter.summingbird.Options +import com.twitter.summingbird.Producer +import com.twitter.summingbird_internal.bijection.BatchPairImplicits._ +import com.twitter.summingbird_internal.runner.common.JobName +import com.twitter.summingbird_internal.runner.scalding.GenericRunner +import com.twitter.summingbird_internal.runner.scalding.ScaldingConfig +import com.twitter.summingbird_internal.runner.scalding.StatebirdState +import com.twitter.summingbird_internal.dalv2.DAL +import com.twitter.summingbird_internal.runner.store_config._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework._ +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding.sources._ +import job.AggregatesV2Job +import org.apache.hadoop.conf.Configuration +/* + * Offline scalding version of summingbird job to compute aggregates v2. + * This is loosely based on the template created by sb-gen. + * Extend this trait in your own scalding job, and override the val + * "aggregatesToCompute" with your own desired set of aggregates. + */ +trait AggregatesV2ScaldingJob { + val aggregatesToCompute: Set[TypedAggregateGroup[_]] + + implicit val aggregationKeyInjection: Injection[AggregationKey, Array[Byte]] = + AggregationKeyInjection + + implicit val aggregationKeyOrdering: AggregationKeyOrdering.type = AggregationKeyOrdering + + implicit val dataRecordCodec: Injection[DataRecord, Array[Byte]] = CompactThriftCodec[DataRecord] + + private implicit val compactDataRecordCodec: Injection[CompactDataRecord, Array[Byte]] = + CompactThriftCodec[CompactDataRecord] + + private val compactDataRecordConverter = new CompactDataRecordConverter() + + def numReducers: Int = -1 + + /** + * Function that maps from a logical ''AggregateSource'' + * to an underlying physical source. The physical source + * for the scalding platform is a ScaldingAggregateSource. + */ + def dataRecordSourceToScalding( + source: AggregateSource + ): Option[Producer[Scalding, DataRecord]] = { + source match { + case offlineSource: OfflineAggregateSource => + Some(ScaldingAggregateSource(offlineSource).source) + case _ => None + } + } + + /** + * Creates and returns a versioned store using the config parameters + * with a specific number of versions to keep, and which can read from + * the most recent available version on HDFS rather than a specific + * version number. The store applies a timestamp correction based on the + * number of days of aggregate data skipped over at read time to ensure + * that skipping data plays nicely with halfLife decay. + * + * @param config specifying the Manhattan store parameters + * @param versionsToKeep number of old versions to keep + */ + def getMostRecentLagCorrectingVersionedStoreWithRetention[ + Key: Codec: Ordering, + ValInStore: Codec, + ValInMemory + ]( + config: OfflineStoreOnlyConfig[ManhattanROConfig], + versionsToKeep: Int, + lagCorrector: (ValInMemory, Long) => ValInMemory, + packer: ValInMemory => ValInStore, + unpacker: ValInStore => ValInMemory + ): ScaldingBatchedStore[Key, ValInMemory] = { + MostRecentLagCorrectingVersionedStore[Key, ValInStore, ValInMemory]( + config.offline.hdfsPath.toString, + packer = packer, + unpacker = unpacker, + versionsToKeep = versionsToKeep)( + Injection.connect[(Key, (BatchID, ValInStore)), (Array[Byte], Array[Byte])], + config.batcher, + implicitly[Ordering[Key]], + lagCorrector + ).withInitialBatch(config.batcher.batchOf(config.startTime.value)) + } + + def mutablyCorrectDataRecordTimestamp( + record: DataRecord, + lagToCorrectMillis: Long + ): DataRecord = { + val richRecord = SRichDataRecord(record) + if (richRecord.hasFeature(TIMESTAMP)) { + val timestamp = richRecord.getFeatureValue(TIMESTAMP).toLong + richRecord.setFeatureValue(TIMESTAMP, timestamp + lagToCorrectMillis) + } + record + } + + /** + * Function that maps from a logical ''AggregateStore'' + * to an underlying physical store. The physical store for + * scalding is a HDFS VersionedKeyValSource dataset. + */ + def aggregateStoreToScalding( + store: AggregateStore + ): Option[Scalding#Store[AggregationKey, DataRecord]] = { + store match { + case offlineStore: OfflineAggregateDataRecordStore => + Some( + getMostRecentLagCorrectingVersionedStoreWithRetention[ + AggregationKey, + DataRecord, + DataRecord]( + offlineStore, + versionsToKeep = offlineStore.batchesToKeep, + lagCorrector = mutablyCorrectDataRecordTimestamp, + packer = Injection.identity[DataRecord], + unpacker = Injection.identity[DataRecord] + ) + ) + case offlineStore: OfflineAggregateDataRecordStoreWithDAL => + Some( + DAL.versionedKeyValStore[AggregationKey, DataRecord]( + dataset = offlineStore.dalDataset, + pathLayout = D.Suffix(offlineStore.offline.hdfsPath.toString), + batcher = offlineStore.batcher, + maybeStartTime = Some(offlineStore.startTime), + maxErrors = offlineStore.maxKvSourceFailures + )) + case _ => None + } + } + + def generate(args: Args): ScaldingConfig = new ScaldingConfig { + val jobName = JobName(args("job_name")) + + /* + * Add registrars for chill serialization for user-defined types. + * We use the default: an empty List(). + */ + override def registrars = List() + + /* Use transformConfig to set Hadoop options. */ + override def transformConfig(config: Map[String, AnyRef]): Map[String, AnyRef] = + super.transformConfig(config) ++ Map( + "mapreduce.output.fileoutputformat.compress" -> "true", + "mapreduce.output.fileoutputformat.compress.codec" -> "com.hadoop.compression.lzo.LzoCodec", + "mapreduce.output.fileoutputformat.compress.type" -> "BLOCK" + ) + + /* + * Use getNamedOptions to set Summingbird runtime options + * The options we set are: + * 1) Set monoid to non-commutative to disable map-side + * aggregation and force all aggregation to reducers (provides a 20% speedup) + */ + override def getNamedOptions: Map[String, Options] = Map( + "DEFAULT" -> Options() + .set(MonoidIsCommutative(false)) + .set(Reducers(numReducers)) + ) + + implicit val batcher: Batcher = Batcher.ofHours(24) + + /* State implementation that uses Statebird (go/statebird) to track the batches processed. */ + def getWaitingState(hadoopConfig: Configuration, startDate: Option[Timestamp], batches: Int) = + StatebirdState( + jobName, + startDate, + batches, + args.optional("statebird_service_destination"), + args.optional("statebird_client_id_name") + )(batcher) + + val sourceNameFilter: Option[Set[String]] = + args.optional("input_sources").map(_.split(",").toSet) + val storeNameFilter: Option[Set[String]] = + args.optional("output_stores").map(_.split(",").toSet) + + val filteredAggregates = + AggregatesV2Job.filterAggregates( + aggregates = aggregatesToCompute, + sourceNames = sourceNameFilter, + storeNames = storeNameFilter + ) + + override val graph = + AggregatesV2Job.generateJobGraph[Scalding]( + filteredAggregates, + dataRecordSourceToScalding, + aggregateStoreToScalding + )(DataRecordAggregationMonoid(filteredAggregates)) + } + def main(args: Array[String]): Unit = { + GenericRunner(args, generate(_)) + + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregationKeyOrdering.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregationKeyOrdering.scala new file mode 100644 index 000000000..af6f14ff2 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/AggregationKeyOrdering.scala @@ -0,0 +1,17 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.scalding_internal.job.RequiredBinaryComparators.ordSer +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.scalding.serialization.macros.impl.ordered_serialization.runtime_helpers.MacroEqualityOrderedSerialization + +object AggregationKeyOrdering extends Ordering[AggregationKey] { + implicit val featureMapsOrdering: MacroEqualityOrderedSerialization[ + (Map[Long, Long], Map[Long, String]) + ] = ordSer[(Map[Long, Long], Map[Long, String])] + + override def compare(left: AggregationKey, right: AggregationKey): Int = + featureMapsOrdering.compare( + AggregationKey.unapply(left).get, + AggregationKey.unapply(right).get + ) +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/BUILD b/timelines/data_processing/ml_util/aggregation_framework/scalding/BUILD new file mode 100644 index 000000000..d03766619 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/BUILD @@ -0,0 +1,72 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/bijection:core", + "3rdparty/jvm/com/twitter/bijection:json", + "3rdparty/jvm/com/twitter/bijection:netty", + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/com/twitter/bijection:thrift", + "3rdparty/jvm/com/twitter/bijection:util", + "3rdparty/jvm/com/twitter/chill:bijection", + "3rdparty/jvm/com/twitter/storehaus:algebra", + "3rdparty/jvm/com/twitter/storehaus:core", + "3rdparty/jvm/org/apache/hadoop:hadoop-client-default", + "3rdparty/src/jvm/com/twitter/scalding:args", + "3rdparty/src/jvm/com/twitter/scalding:commons", + "3rdparty/src/jvm/com/twitter/scalding:core", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "3rdparty/src/jvm/com/twitter/summingbird:batch-hadoop", + "3rdparty/src/jvm/com/twitter/summingbird:chill", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "3rdparty/src/jvm/com/twitter/summingbird:scalding", + "finagle/finagle-core/src/main", + "gizmoduck/snapshot/src/main/scala/com/twitter/gizmoduck/snapshot:deleted_user-scala", + "src/java/com/twitter/ml/api:api-base", + "src/java/com/twitter/ml/api/constant", + "src/scala/com/twitter/ml/api/util", + "src/scala/com/twitter/scalding_internal/dalv2", + "src/scala/com/twitter/scalding_internal/job/analytics_batch", + "src/scala/com/twitter/scalding_internal/util", + "src/scala/com/twitter/storehaus_internal/manhattan/config", + "src/scala/com/twitter/storehaus_internal/offline", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/summingbird_internal/bijection", + "src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits", + "src/scala/com/twitter/summingbird_internal/dalv2", + "src/scala/com/twitter/summingbird_internal/runner/common", + "src/scala/com/twitter/summingbird_internal/runner/scalding", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + "src/scala/com/twitter/summingbird_internal/runner/store_config/versioned_store", + "src/scala/com/twitter/summingbird_internal/sources/common", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "src/thrift/com/twitter/statebird:compiled-v2-java", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + "timelines/data_processing/ml_util/aggregation_framework:user_job", + "timelines/data_processing/ml_util/aggregation_framework/scalding/sources", + "timelines/data_processing/ml_util/sampling:sampling_utils", + ], + exports = [ + "3rdparty/src/jvm/com/twitter/summingbird:scalding", + "src/scala/com/twitter/storehaus_internal/manhattan/config", + "src/scala/com/twitter/summingbird_internal/runner/store_config", + ], +) + +hadoop_binary( + name = "bin", + basename = "aggregation_framework_scalding-deploy", + main = "com.twitter.scalding.Tool", + platform = "java8", + runtime_platform = "java8", + tags = [ + "bazel-compatible", + "bazel-compatible:migrated", + "bazel-only", + ], + dependencies = [ + ":scalding", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/DeletedUserPruner.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/DeletedUserPruner.scala new file mode 100644 index 000000000..7e2f7a95c --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/DeletedUserPruner.scala @@ -0,0 +1,97 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.gizmoduck.snapshot.DeletedUserScalaDataset +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.Feature +import com.twitter.scalding.typed.TypedPipe +import com.twitter.scalding.DateOps +import com.twitter.scalding.DateRange +import com.twitter.scalding.Days +import com.twitter.scalding.RichDate +import com.twitter.scalding_internal.dalv2.DAL +import com.twitter.scalding_internal.dalv2.remote_access.AllowCrossClusterSameDC +import com.twitter.scalding_internal.job.RequiredBinaryComparators.ordSer +import com.twitter.scalding_internal.pruner.Pruner +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup +import com.twitter.scalding.serialization.macros.impl.ordered_serialization.runtime_helpers.MacroEqualityOrderedSerialization +import java.{util => ju} + +object DeletedUserSeqPruner extends Pruner[Seq[Long]] { + implicit val tz: ju.TimeZone = DateOps.UTC + implicit val userIdSequenceOrdering: MacroEqualityOrderedSerialization[Seq[Long]] = + ordSer[Seq[Long]] + + private[scalding] def pruneDeletedUsers[T]( + input: TypedPipe[T], + extractor: T => Seq[Long], + deletedUsers: TypedPipe[Long] + ): TypedPipe[T] = { + val userIdsAndValues = input.map { t: T => + val userIds: Seq[Long] = extractor(t) + (userIds, t) + } + + // Find all valid sequences of userids in the input pipe + // that contain at least one deleted user. This is efficient + // as long as the number of deleted users is small. + val userSequencesWithDeletedUsers = userIdsAndValues + .flatMap { case (userIds, _) => userIds.map((_, userIds)) } + .leftJoin(deletedUsers.asKeys) + .collect { case (_, (userIds, Some(_))) => userIds } + .distinct + + userIdsAndValues + .leftJoin(userSequencesWithDeletedUsers.asKeys) + .collect { case (_, (t, None)) => t } + } + + override def prune[T]( + input: TypedPipe[T], + put: (T, Seq[Long]) => Option[T], + get: T => Seq[Long], + writeTime: RichDate + ): TypedPipe[T] = { + lazy val deletedUsers = DAL + .readMostRecentSnapshot(DeletedUserScalaDataset, DateRange(writeTime - Days(7), writeTime)) + .withRemoteReadPolicy(AllowCrossClusterSameDC) + .toTypedPipe + .map(_.userId) + + pruneDeletedUsers(input, get, deletedUsers) + } +} + +object AggregationKeyPruner { + + /** + * Makes a pruner that prunes aggregate records where any of the + * "userIdFeatures" set in the aggregation key correspond to a + * user who has deleted their account. Here, "userIdFeatures" is + * intended as a catch-all term for all features corresponding to + * a Twitter user in the input data record -- the feature itself + * could represent an authorId, retweeterId, engagerId, etc. + */ + def mkDeletedUsersPruner( + userIdFeatures: Seq[Feature[_]] + ): Pruner[(AggregationKey, DataRecord)] = { + val userIdFeatureIds = userIdFeatures.map(TypedAggregateGroup.getDenseFeatureId) + + def getter(tupled: (AggregationKey, DataRecord)): Seq[Long] = { + tupled match { + case (aggregationKey, _) => + userIdFeatureIds.flatMap { id => + aggregationKey.discreteFeaturesById + .get(id) + .orElse(aggregationKey.textFeaturesById.get(id).map(_.toLong)) + } + } + } + + // Setting putter to always return None here. The put function is not used within pruneDeletedUsers, this function is just needed for xmap api. + def putter: ((AggregationKey, DataRecord), Seq[Long]) => Option[(AggregationKey, DataRecord)] = + (t, seq) => None + + DeletedUserSeqPruner.xmap(putter, getter) + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/MostRecentVersionedStore.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/MostRecentVersionedStore.scala new file mode 100644 index 000000000..d60e67716 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/MostRecentVersionedStore.scala @@ -0,0 +1,100 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding + +import com.twitter.bijection.Injection +import com.twitter.scalding.commons.source.VersionedKeyValSource +import com.twitter.scalding.TypedPipe +import com.twitter.scalding.{Hdfs => HdfsMode} +import com.twitter.summingbird.batch.store.HDFSMetadata +import com.twitter.summingbird.batch.BatchID +import com.twitter.summingbird.batch.Batcher +import com.twitter.summingbird.batch.OrderedFromOrderingExt +import com.twitter.summingbird.batch.PrunedSpace +import com.twitter.summingbird.scalding._ +import com.twitter.summingbird.scalding.store.VersionedBatchStore +import org.slf4j.LoggerFactory + +object MostRecentLagCorrectingVersionedStore { + def apply[Key, ValInStore, ValInMemory]( + rootPath: String, + packer: ValInMemory => ValInStore, + unpacker: ValInStore => ValInMemory, + versionsToKeep: Int = VersionedKeyValSource.defaultVersionsToKeep, + prunedSpace: PrunedSpace[(Key, ValInMemory)] = PrunedSpace.neverPruned + )( + implicit injection: Injection[(Key, (BatchID, ValInStore)), (Array[Byte], Array[Byte])], + batcher: Batcher, + ord: Ordering[Key], + lagCorrector: (ValInMemory, Long) => ValInMemory + ): MostRecentLagCorrectingVersionedBatchStore[Key, ValInMemory, Key, (BatchID, ValInStore)] = { + new MostRecentLagCorrectingVersionedBatchStore[Key, ValInMemory, Key, (BatchID, ValInStore)]( + rootPath, + versionsToKeep, + batcher + )(lagCorrector)({ case (batchID, (k, v)) => (k, (batchID.next, packer(v))) })({ + case (k, (_, v)) => (k, unpacker(v)) + }) { + override def select(b: List[BatchID]) = List(b.last) + override def pruning: PrunedSpace[(Key, ValInMemory)] = prunedSpace + } + } +} + +/** + * @param lagCorrector lagCorrector allows one to take data from one batch and pretend as if it + * came from a different batch. + * @param pack Converts the in-memory tuples to the type used by the underlying key-val store. + * @param unpack Converts the key-val tuples from the store in the form used by the calling object. + */ +class MostRecentLagCorrectingVersionedBatchStore[KeyInMemory, ValInMemory, KeyInStore, ValInStore]( + rootPath: String, + versionsToKeep: Int, + override val batcher: Batcher +)( + lagCorrector: (ValInMemory, Long) => ValInMemory +)( + pack: (BatchID, (KeyInMemory, ValInMemory)) => (KeyInStore, ValInStore) +)( + unpack: ((KeyInStore, ValInStore)) => (KeyInMemory, ValInMemory) +)( + implicit @transient injection: Injection[(KeyInStore, ValInStore), (Array[Byte], Array[Byte])], + override val ordering: Ordering[KeyInMemory]) + extends VersionedBatchStore[KeyInMemory, ValInMemory, KeyInStore, ValInStore]( + rootPath, + versionsToKeep, + batcher)(pack)(unpack)(injection, ordering) { + + import OrderedFromOrderingExt._ + + @transient private val logger = + LoggerFactory.getLogger(classOf[MostRecentLagCorrectingVersionedBatchStore[_, _, _, _]]) + + override protected def lastBatch( + exclusiveUB: BatchID, + mode: HdfsMode + ): Option[(BatchID, FlowProducer[TypedPipe[(KeyInMemory, ValInMemory)]])] = { + val batchToPretendAs = exclusiveUB.prev + val versionToPretendAs = batchIDToVersion(batchToPretendAs) + logger.info( + s"Most recent lag correcting versioned batched store at $rootPath entering lastBatch method versionToPretendAs = $versionToPretendAs") + val meta = new HDFSMetadata(mode.conf, rootPath) + meta.versions + .map { ver => (versionToBatchID(ver), readVersion(ver)) } + .filter { _._1 < exclusiveUB } + .reduceOption { (a, b) => if (a._1 > b._1) a else b } + .map { + case ( + lastBatchID: BatchID, + flowProducer: FlowProducer[TypedPipe[(KeyInMemory, ValInMemory)]]) => + val lastVersion = batchIDToVersion(lastBatchID) + val lagToCorrectMillis: Long = + batchIDToVersion(batchToPretendAs) - batchIDToVersion(lastBatchID) + logger.info( + s"Most recent available version is $lastVersion, so lagToCorrectMillis is $lagToCorrectMillis") + val lagCorrectedFlowProducer = flowProducer.map { + pipe: TypedPipe[(KeyInMemory, ValInMemory)] => + pipe.map { case (k, v) => (k, lagCorrector(v, lagToCorrectMillis)) } + } + (batchToPretendAs, lagCorrectedFlowProducer) + } + } +} diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/BUILD b/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/BUILD new file mode 100644 index 000000000..ba065ecd7 --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/BUILD @@ -0,0 +1,26 @@ +scala_library( + sources = ["*.scala"], + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/storehaus:algebra", + "3rdparty/src/jvm/com/twitter/scalding:commons", + "3rdparty/src/jvm/com/twitter/scalding:core", + "3rdparty/src/jvm/com/twitter/scalding:date", + "3rdparty/src/jvm/com/twitter/summingbird:batch", + "3rdparty/src/jvm/com/twitter/summingbird:batch-hadoop", + "3rdparty/src/jvm/com/twitter/summingbird:chill", + "3rdparty/src/jvm/com/twitter/summingbird:core", + "3rdparty/src/jvm/com/twitter/summingbird:scalding", + "src/java/com/twitter/ml/api:api-base", + "src/scala/com/twitter/ml/api:api-base", + "src/scala/com/twitter/ml/api/internal", + "src/scala/com/twitter/ml/api/util", + "src/scala/com/twitter/scalding_internal/dalv2", + "src/scala/com/twitter/scalding_internal/dalv2/remote_access", + "src/scala/com/twitter/summingbird_internal/sources/common", + "src/thrift/com/twitter/ml/api:data-java", + "src/thrift/com/twitter/ml/api:interpretable-model-java", + "timelines/data_processing/ml_util/aggregation_framework:common_types", + ], +) diff --git a/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/ScaldingAggregateSource.scala b/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/ScaldingAggregateSource.scala new file mode 100644 index 000000000..d1820b4fc --- /dev/null +++ b/timelines/data_processing/ml_util/aggregation_framework/scalding/sources/ScaldingAggregateSource.scala @@ -0,0 +1,77 @@ +package com.twitter.timelines.data_processing.ml_util.aggregation_framework.scalding.sources + +import com.twitter.ml.api.DailySuffixFeatureSource +import com.twitter.ml.api.DataRecord +import com.twitter.ml.api.FixedPathFeatureSource +import com.twitter.ml.api.HourlySuffixFeatureSource +import com.twitter.ml.api.util.SRichDataRecord +import com.twitter.scalding._ +import com.twitter.scalding_internal.dalv2.DAL +import com.twitter.scalding_internal.dalv2.remote_access.AllowCrossClusterSameDC +import com.twitter.statebird.v2.thriftscala.Environment +import com.twitter.summingbird._ +import com.twitter.summingbird.scalding.Scalding.pipeFactoryExact +import com.twitter.summingbird.scalding._ +import com.twitter.summingbird_internal.sources.SourceFactory +import com.twitter.timelines.data_processing.ml_util.aggregation_framework.OfflineAggregateSource +import java.lang.{Long => JLong} + +/* + * Summingbird offline HDFS source that reads from data records on HDFS. + * + * @param offlineSource Underlying offline source that contains + * all the config info to build this platform-specific (scalding) source. + */ +case class ScaldingAggregateSource(offlineSource: OfflineAggregateSource) + extends SourceFactory[Scalding, DataRecord] { + + val hdfsPath: String = offlineSource.scaldingHdfsPath.getOrElse("") + val suffixType: String = offlineSource.scaldingSuffixType.getOrElse("daily") + val withValidation: Boolean = offlineSource.withValidation + def name: String = offlineSource.name + def description: String = + "Summingbird offline source that reads from data records at: " + hdfsPath + + implicit val timeExtractor: TimeExtractor[DataRecord] = TimeExtractor((record: DataRecord) => + SRichDataRecord(record).getFeatureValue[JLong, JLong](offlineSource.timestampFeature)) + + def getSourceForDateRange(dateRange: DateRange) = { + suffixType match { + case "daily" => DailySuffixFeatureSource(hdfsPath)(dateRange).source + case "hourly" => HourlySuffixFeatureSource(hdfsPath)(dateRange).source + case "fixed_path" => FixedPathFeatureSource(hdfsPath).source + case "dal" => + offlineSource.dalDataSet match { + case Some(dataset) => + DAL + .read(dataset, dateRange) + .withRemoteReadPolicy(AllowCrossClusterSameDC) + .withEnvironment(Environment.Prod) + .toTypedSource + case _ => + throw new IllegalArgumentException( + "cannot provide an empty dataset when defining DAL as the suffix type" + ) + } + } + } + + /** + * This method is similar to [[Scalding.sourceFromMappable]] except that this uses [[pipeFactoryExact]] + * instead of [[pipeFactory]]. [[pipeFactoryExact]] also invokes [[FileSource.validateTaps]] on the source. + * The validation ensures the presence of _SUCCESS file before processing. For more details, please refer to + * https://jira.twitter.biz/browse/TQ-10618 + */ + def sourceFromMappableWithValidation[T: TimeExtractor: Manifest]( + factory: (DateRange) => Mappable[T] + ): Producer[Scalding, T] = { + Producer.source[Scalding, T](pipeFactoryExact(factory)) + } + + def source: Producer[Scalding, DataRecord] = { + if (withValidation) + sourceFromMappableWithValidation(getSourceForDateRange) + else + Scalding.sourceFromMappable(getSourceForDateRange) + } +} diff --git a/topic-social-proof/README.md b/topic-social-proof/README.md new file mode 100644 index 000000000..d98b7ba3b --- /dev/null +++ b/topic-social-proof/README.md @@ -0,0 +1,8 @@ +# Topic Social Proof Service (TSPS) +================= + +**Topic Social Proof Service** (TSPS) serves as a centralized source for verifying topics related to Timelines and Notifications. By analyzing user's topic preferences, such as following or unfollowing, and employing semantic annotations and tweet embeddings from SimClusters, or other machine learning models, TSPS delivers highly relevant topics tailored to each user's interests. + +For instance, when a tweet discusses Stephen Curry, the service determines if the content falls under topics like "NBA" and/or "Golden State Warriors" while also providing relevance scores based on SimClusters Embedding. Additionally, TSPS evaluates user-specific topic preferences to offer a comprehensive list of available topics, only those the user is currently following, or new topics they have not followed but may find interesting if recommended on specific product surfaces. + + diff --git a/topic-social-proof/server/BUILD b/topic-social-proof/server/BUILD new file mode 100644 index 000000000..9fb977d17 --- /dev/null +++ b/topic-social-proof/server/BUILD @@ -0,0 +1,24 @@ +jvm_binary( + name = "bin", + basename = "topic-social-proof", + main = "com.twitter.tsp.TopicSocialProofStratoFedServerMain", + runtime_platform = "java11", + tags = [ + "bazel-compatible", + ], + dependencies = [ + "strato/src/main/scala/com/twitter/strato/logging/logback", + "topic-social-proof/server/src/main/resources", + "topic-social-proof/server/src/main/scala/com/twitter/tsp", + ], +) + +# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app +jvm_app( + name = "topic-social-proof-app", + archive = "zip", + binary = ":bin", + tags = [ + "bazel-compatible", + ], +) diff --git a/topic-social-proof/server/src/main/resources/BUILD b/topic-social-proof/server/src/main/resources/BUILD new file mode 100644 index 000000000..8f96f402c --- /dev/null +++ b/topic-social-proof/server/src/main/resources/BUILD @@ -0,0 +1,8 @@ +resources( + sources = [ + "*.xml", + "*.yml", + "config/*.yml", + ], + tags = ["bazel-compatible"], +) diff --git a/topic-social-proof/server/src/main/resources/config/decider.yml b/topic-social-proof/server/src/main/resources/config/decider.yml new file mode 100644 index 000000000..c40dd7080 --- /dev/null +++ b/topic-social-proof/server/src/main/resources/config/decider.yml @@ -0,0 +1,61 @@ +# Keys are sorted in an alphabetical order + +enable_topic_social_proof_score: + comment : "Enable the calculation of cosine similarity score in TopicSocialProofStore. 0 means do not calculate the score and use a random rank to generate topic social proof" + default_availability: 0 + +enable_tweet_health_score: + comment: "Enable the calculation for health scores in tweetInfo. By enabling this decider, we will compute TweetHealthModelScore" + default_availability: 0 + +enable_user_agatha_score: + comment: "Enable the calculation for health scores in tweetInfo. By enabling this decider, we will compute UserHealthModelScore" + default_availability: 0 + +enable_loadshedding_HomeTimeline: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineRecommendTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_MagicRecsRecommendTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_TopicLandingPage: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineFeatures: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineTopicTweetsMetrics: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineUTEGTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_HomeTimelineSimClusters: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_ExploreTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_MagicRecsTopicTweets: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 + +enable_loadshedding_Search: + comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response" + default_availability: 0 diff --git a/topic-social-proof/server/src/main/resources/logback.xml b/topic-social-proof/server/src/main/resources/logback.xml new file mode 100644 index 000000000..d08b0a965 --- /dev/null +++ b/topic-social-proof/server/src/main/resources/logback.xml @@ -0,0 +1,155 @@ + + + + + + + + + + + + + + + + + + + + + + true + + + + + ${log.service.output} + + ${log.service.output}.%i + 1 + 10 + + + 50MB + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + ${log.strato_only.output} + + ${log.strato_only.output}.%i + 1 + 10 + + + 50MB + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + true + loglens + ${log.lens.index} + ${log.lens.tag}/service + + %msg%n + + + 500 + 50 + + + manhattan-client + .*InvalidRequest.* + + + + + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/BUILD new file mode 100644 index 000000000..2052c5047 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/BUILD @@ -0,0 +1,12 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "finatra/inject/inject-thrift-client", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/columns", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/TopicSocialProofStratoFedServer.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/TopicSocialProofStratoFedServer.scala new file mode 100644 index 000000000..22d3c19f0 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/TopicSocialProofStratoFedServer.scala @@ -0,0 +1,56 @@ +package com.twitter.tsp + +import com.google.inject.Module +import com.twitter.strato.fed._ +import com.twitter.strato.fed.server._ +import com.twitter.strato.warmup.Warmer +import com.twitter.tsp.columns.TopicSocialProofColumn +import com.twitter.tsp.columns.TopicSocialProofBatchColumn +import com.twitter.tsp.handlers.UttChildrenWarmupHandler +import com.twitter.tsp.modules.RepresentationScorerStoreModule +import com.twitter.tsp.modules.GizmoduckUserModule +import com.twitter.tsp.modules.TSPClientIdModule +import com.twitter.tsp.modules.TopicListingModule +import com.twitter.tsp.modules.TopicSocialProofStoreModule +import com.twitter.tsp.modules.TopicTweetCosineSimilarityAggregateStoreModule +import com.twitter.tsp.modules.TweetInfoStoreModule +import com.twitter.tsp.modules.TweetyPieClientModule +import com.twitter.tsp.modules.UttClientModule +import com.twitter.tsp.modules.UttLocalizationModule +import com.twitter.util.Future + +object TopicSocialProofStratoFedServerMain extends TopicSocialProofStratoFedServer + +trait TopicSocialProofStratoFedServer extends StratoFedServer { + override def dest: String = "/s/topic-social-proof/topic-social-proof" + + override val modules: Seq[Module] = + Seq( + GizmoduckUserModule, + RepresentationScorerStoreModule, + TopicSocialProofStoreModule, + TopicListingModule, + TopicTweetCosineSimilarityAggregateStoreModule, + TSPClientIdModule, + TweetInfoStoreModule, + TweetyPieClientModule, + UttClientModule, + UttLocalizationModule + ) + + override def columns: Seq[Class[_ <: StratoFed.Column]] = + Seq( + classOf[TopicSocialProofColumn], + classOf[TopicSocialProofBatchColumn] + ) + + override def configureWarmer(warmer: Warmer): Unit = { + warmer.add( + "uttChildrenWarmupHandler", + () => { + handle[UttChildrenWarmupHandler]() + Future.Unit + } + ) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/BUILD new file mode 100644 index 000000000..c29b7ea35 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/BUILD @@ -0,0 +1,12 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "stitch/stitch-storehaus", + "strato/src/main/scala/com/twitter/strato/fed", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/service", + "topic-social-proof/server/src/main/thrift:thrift-scala", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofBatchColumn.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofBatchColumn.scala new file mode 100644 index 000000000..f451e662a --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofBatchColumn.scala @@ -0,0 +1,84 @@ +package com.twitter.tsp.columns + +import com.twitter.stitch.SeqGroup +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.Fetch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config._ +import com.twitter.strato.config.AllowAll +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle.Production +import com.twitter.strato.fed.StratoFed +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.tsp.thriftscala.TopicSocialProofRequest +import com.twitter.tsp.thriftscala.TopicSocialProofOptions +import com.twitter.tsp.service.TopicSocialProofService +import com.twitter.tsp.thriftscala.TopicWithScore +import com.twitter.util.Future +import com.twitter.util.Try +import javax.inject.Inject + +class TopicSocialProofBatchColumn @Inject() ( + topicSocialProofService: TopicSocialProofService) + extends StratoFed.Column(TopicSocialProofBatchColumn.Path) + with StratoFed.Fetch.Stitch { + + override val policy: Policy = + ReadWritePolicy( + readPolicy = AllowAll, + writePolicy = AllowKeyAuthenticatedTwitterUserId + ) + + override type Key = Long + override type View = TopicSocialProofOptions + override type Value = Seq[TopicWithScore] + + override val keyConv: Conv[Key] = Conv.ofType + override val viewConv: Conv[View] = ScroogeConv.fromStruct[TopicSocialProofOptions] + override val valueConv: Conv[Value] = Conv.seq(ScroogeConv.fromStruct[TopicWithScore]) + override val metadata: OpMetadata = + OpMetadata( + lifecycle = Some(Production), + Some(PlainText("Topic Social Proof Batched Federated Column"))) + + case class TspsGroup(view: View) extends SeqGroup[Long, Fetch.Result[Value]] { + override protected def run(keys: Seq[Long]): Future[Seq[Try[Result[Seq[TopicWithScore]]]]] = { + val request = TopicSocialProofRequest( + userId = view.userId, + tweetIds = keys.toSet, + displayLocation = view.displayLocation, + topicListingSetting = view.topicListingSetting, + context = view.context, + bypassModes = view.bypassModes, + tags = view.tags + ) + + val response = topicSocialProofService + .topicSocialProofHandlerStoreStitch(request) + .map(_.socialProofs) + Stitch + .run(response).map(r => + keys.map(key => { + Try { + val v = r.get(key) + if (v.nonEmpty && v.get.nonEmpty) { + found(v.get) + } else { + missing + } + } + })) + } + } + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + Stitch.call(key, TspsGroup(view)) + } +} + +object TopicSocialProofBatchColumn { + val Path = "topic-signals/tsp/topic-social-proof-batched" +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofColumn.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofColumn.scala new file mode 100644 index 000000000..10425eccb --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/columns/TopicSocialProofColumn.scala @@ -0,0 +1,47 @@ +package com.twitter.tsp.columns + +import com.twitter.stitch +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.config._ +import com.twitter.strato.config.AllowAll +import com.twitter.strato.config.ContactInfo +import com.twitter.strato.config.Policy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description.PlainText +import com.twitter.strato.data.Lifecycle.Production +import com.twitter.strato.fed.StratoFed +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.tsp.thriftscala.TopicSocialProofRequest +import com.twitter.tsp.thriftscala.TopicSocialProofResponse +import com.twitter.tsp.service.TopicSocialProofService +import javax.inject.Inject + +class TopicSocialProofColumn @Inject() ( + topicSocialProofService: TopicSocialProofService) + extends StratoFed.Column(TopicSocialProofColumn.Path) + with StratoFed.Fetch.Stitch { + + override type Key = TopicSocialProofRequest + override type View = Unit + override type Value = TopicSocialProofResponse + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[TopicSocialProofRequest] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[TopicSocialProofResponse] + override val metadata: OpMetadata = + OpMetadata(lifecycle = Some(Production), Some(PlainText("Topic Social Proof Federated Column"))) + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + topicSocialProofService + .topicSocialProofHandlerStoreStitch(key) + .map { result => found(result) } + .handle { + case stitch.NotFound => missing + } + } +} + +object TopicSocialProofColumn { + val Path = "topic-signals/tsp/topic-social-proof" +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/BUILD new file mode 100644 index 000000000..7b5fda3b0 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/BUILD @@ -0,0 +1,23 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "configapi/configapi-abdecider", + "configapi/configapi-core", + "content-recommender/thrift/src/main/thrift:thrift-scala", + "decider/src/main/scala", + "discovery-common/src/main/scala/com/twitter/discovery/common/configapi", + "featureswitches/featureswitches-core", + "finatra/inject/inject-core/src/main/scala", + "frigate/frigate-common:base", + "frigate/frigate-common:util", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/candidate", + "interests-service/thrift/src/main/thrift:thrift-scala", + "src/scala/com/twitter/simclusters_v2/common", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "stitch/stitch-storehaus", + "topic-social-proof/server/src/main/thrift:thrift-scala", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/DeciderConstants.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/DeciderConstants.scala new file mode 100644 index 000000000..de025128d --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/DeciderConstants.scala @@ -0,0 +1,19 @@ +package com.twitter.tsp.common + +import com.twitter.servo.decider.DeciderKeyEnum + +object DeciderConstants { + val enableTopicSocialProofScore = "enable_topic_social_proof_score" + val enableHealthSignalsScoreDeciderKey = "enable_tweet_health_score" + val enableUserAgathaScoreDeciderKey = "enable_user_agatha_score" +} + +object DeciderKey extends DeciderKeyEnum { + + val enableHealthSignalsScoreDeciderKey: Value = Value( + DeciderConstants.enableHealthSignalsScoreDeciderKey + ) + val enableUserAgathaScoreDeciderKey: Value = Value( + DeciderConstants.enableUserAgathaScoreDeciderKey + ) +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/FeatureSwitchesBuilder.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/FeatureSwitchesBuilder.scala new file mode 100644 index 000000000..a3b269cba --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/FeatureSwitchesBuilder.scala @@ -0,0 +1,34 @@ +package com.twitter.tsp.common + +import com.twitter.abdecider.LoggingABDecider +import com.twitter.featureswitches.v2.FeatureSwitches +import com.twitter.featureswitches.v2.builder.{FeatureSwitchesBuilder => FsBuilder} +import com.twitter.featureswitches.v2.experimentation.NullBucketImpressor +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.util.Duration + +case class FeatureSwitchesBuilder( + statsReceiver: StatsReceiver, + abDecider: LoggingABDecider, + featuresDirectory: String, + addServiceDetailsFromAurora: Boolean, + configRepoDirectory: String = "/usr/local/config", + fastRefresh: Boolean = false, + impressExperiments: Boolean = true) { + + def build(): FeatureSwitches = { + val featureSwitches = FsBuilder() + .abDecider(abDecider) + .statsReceiver(statsReceiver) + .configRepoAbsPath(configRepoDirectory) + .featuresDirectory(featuresDirectory) + .limitToReferencedExperiments(shouldLimit = true) + .experimentImpressionStatsEnabled(true) + + if (!impressExperiments) featureSwitches.experimentBucketImpressor(NullBucketImpressor) + if (addServiceDetailsFromAurora) featureSwitches.serviceDetailsFromAurora() + if (fastRefresh) featureSwitches.refreshPeriod(Duration.fromSeconds(10)) + + featureSwitches.build() + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/LoadShedder.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/LoadShedder.scala new file mode 100644 index 000000000..2071ea07e --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/LoadShedder.scala @@ -0,0 +1,44 @@ +package com.twitter.tsp.common + +import com.twitter.decider.Decider +import com.twitter.decider.RandomRecipient +import com.twitter.util.Future +import javax.inject.Inject +import scala.util.control.NoStackTrace + +/* + Provides deciders-controlled load shedding for a given displayLocation + The format of the decider keys is: + + enable_loadshedding_ + E.g.: + enable_loadshedding_HomeTimeline + + Deciders are fractional, so a value of 50.00 will drop 50% of responses. If a decider key is not + defined for a particular displayLocation, those requests will always be served. + + We should therefore aim to define keys for the locations we care most about in decider.yml, + so that we can control them during incidents. + */ +class LoadShedder @Inject() (decider: Decider) { + import LoadShedder._ + + // Fall back to False for any undefined key + private val deciderWithFalseFallback: Decider = decider.orElse(Decider.False) + private val keyPrefix = "enable_loadshedding" + + def apply[T](typeString: String)(serve: => Future[T]): Future[T] = { + /* + Per-typeString level load shedding: enable_loadshedding_HomeTimeline + Checks if per-typeString load shedding is enabled + */ + val keyTyped = s"${keyPrefix}_$typeString" + if (deciderWithFalseFallback.isAvailable(keyTyped, recipient = Some(RandomRecipient))) + Future.exception(LoadSheddingException) + else serve + } +} + +object LoadShedder { + object LoadSheddingException extends Exception with NoStackTrace +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/ParamsBuilder.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/ParamsBuilder.scala new file mode 100644 index 000000000..93fe9cbaf --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/ParamsBuilder.scala @@ -0,0 +1,98 @@ +package com.twitter.tsp.common + +import com.twitter.abdecider.LoggingABDecider +import com.twitter.abdecider.UserRecipient +import com.twitter.contentrecommender.thriftscala.DisplayLocation +import com.twitter.discovery.common.configapi.FeatureContextBuilder +import com.twitter.featureswitches.FSRecipient +import com.twitter.featureswitches.Recipient +import com.twitter.featureswitches.UserAgent +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.interests.thriftscala.TopicListingViewerContext +import com.twitter.timelines.configapi +import com.twitter.timelines.configapi.Params +import com.twitter.timelines.configapi.RequestContext +import com.twitter.timelines.configapi.abdecider.LoggingABDeciderExperimentContext + +case class ParamsBuilder( + featureContextBuilder: FeatureContextBuilder, + abDecider: LoggingABDecider, + overridesConfig: configapi.Config, + statsReceiver: StatsReceiver) { + + def buildFromTopicListingViewerContext( + topicListingViewerContext: Option[TopicListingViewerContext], + displayLocation: DisplayLocation, + userRoleOverride: Option[Set[String]] = None + ): Params = { + + topicListingViewerContext.flatMap(_.userId) match { + case Some(userId) => + val userRecipient = ParamsBuilder.toFeatureSwitchRecipientWithTopicContext( + userId, + userRoleOverride, + topicListingViewerContext, + Some(displayLocation) + ) + + overridesConfig( + requestContext = RequestContext( + userId = Some(userId), + experimentContext = LoggingABDeciderExperimentContext( + abDecider, + Some(UserRecipient(userId, Some(userId)))), + featureContext = featureContextBuilder( + Some(userId), + Some(userRecipient) + ) + ), + statsReceiver + ) + case _ => + throw new IllegalArgumentException( + s"${this.getClass.getSimpleName} tried to build Param for a request without a userId" + ) + } + } +} + +object ParamsBuilder { + + def toFeatureSwitchRecipientWithTopicContext( + userId: Long, + userRolesOverride: Option[Set[String]], + context: Option[TopicListingViewerContext], + displayLocationOpt: Option[DisplayLocation] + ): Recipient = { + val userRoles = userRolesOverride match { + case Some(overrides) => Some(overrides) + case _ => context.flatMap(_.userRoles.map(_.toSet)) + } + + val recipient = FSRecipient( + userId = Some(userId), + userRoles = userRoles, + deviceId = context.flatMap(_.deviceId), + guestId = context.flatMap(_.guestId), + languageCode = context.flatMap(_.languageCode), + countryCode = context.flatMap(_.countryCode), + userAgent = context.flatMap(_.userAgent).flatMap(UserAgent(_)), + isVerified = None, + isTwoffice = None, + tooClient = None, + highWaterMark = None + ) + displayLocationOpt match { + case Some(displayLocation) => + recipient.withCustomFields(displayLocationCustomFieldMap(displayLocation)) + case None => + recipient + } + } + + private val DisplayLocationCustomField = "display_location" + + def displayLocationCustomFieldMap(displayLocation: DisplayLocation): (String, String) = + DisplayLocationCustomField -> displayLocation.toString + +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/RecTargetFactory.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/RecTargetFactory.scala new file mode 100644 index 000000000..26eeda736 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/RecTargetFactory.scala @@ -0,0 +1,65 @@ +package com.twitter.tsp.common + +import com.twitter.abdecider.LoggingABDecider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.base.TargetUser +import com.twitter.frigate.common.candidate.TargetABDecider +import com.twitter.frigate.common.util.ABDeciderWithOverride +import com.twitter.gizmoduck.thriftscala.User +import com.twitter.simclusters_v2.common.UserId +import com.twitter.storehaus.ReadableStore +import com.twitter.timelines.configapi.Params +import com.twitter.tsp.thriftscala.TopicSocialProofRequest +import com.twitter.util.Future + +case class DefaultRecTopicSocialProofTarget( + topicSocialProofRequest: TopicSocialProofRequest, + targetId: UserId, + user: Option[User], + abDecider: ABDeciderWithOverride, + params: Params +)( + implicit statsReceiver: StatsReceiver) + extends TargetUser + with TopicSocialProofRecRequest + with TargetABDecider { + override def globalStats: StatsReceiver = statsReceiver + override val targetUser: Future[Option[User]] = Future.value(user) +} + +trait TopicSocialProofRecRequest { + tuc: TargetUser => + + val topicSocialProofRequest: TopicSocialProofRequest +} + +case class RecTargetFactory( + abDecider: LoggingABDecider, + userStore: ReadableStore[UserId, User], + paramBuilder: ParamsBuilder, + statsReceiver: StatsReceiver) { + + type RecTopicSocialProofTarget = DefaultRecTopicSocialProofTarget + + def buildRecTopicSocialProofTarget( + request: TopicSocialProofRequest + ): Future[RecTopicSocialProofTarget] = { + val userId = request.userId + userStore.get(userId).map { userOpt => + val userRoles = userOpt.flatMap(_.roles.map(_.roles.toSet)) + + val context = request.context.copy(userId = Some(request.userId)) // override to make sure + + val params = paramBuilder + .buildFromTopicListingViewerContext(Some(context), request.displayLocation, userRoles) + + DefaultRecTopicSocialProofTarget( + request, + userId, + userOpt, + ABDeciderWithOverride(abDecider, None)(statsReceiver), + params + )(statsReceiver) + } + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofDecider.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofDecider.scala new file mode 100644 index 000000000..39a4acb89 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofDecider.scala @@ -0,0 +1,26 @@ +package com.twitter.tsp +package common + +import com.twitter.decider.Decider +import com.twitter.decider.RandomRecipient +import com.twitter.decider.Recipient +import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing +import javax.inject.Inject + +case class TopicSocialProofDecider @Inject() (decider: Decider) { + + def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = { + decider.isAvailable(feature, recipient) + } + + lazy val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider) + + /** + * When useRandomRecipient is set to false, the decider is either completely on or off. + * When useRandomRecipient is set to true, the decider is on for the specified % of traffic. + */ + def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = { + if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient)) + else isAvailable(feature, None) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofParams.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofParams.scala new file mode 100644 index 000000000..4effe1313 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/common/TopicSocialProofParams.scala @@ -0,0 +1,104 @@ +package com.twitter.tsp.common + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.logging.Logger +import com.twitter.timelines.configapi.BaseConfig +import com.twitter.timelines.configapi.BaseConfigBuilder +import com.twitter.timelines.configapi.FSBoundedParam +import com.twitter.timelines.configapi.FSParam +import com.twitter.timelines.configapi.FeatureSwitchOverrideUtil + +object TopicSocialProofParams { + + object TopicTweetsSemanticCoreVersionId + extends FSBoundedParam[Long]( + name = "topic_tweets_semantic_core_annotation_version_id", + default = 1433487161551032320L, + min = 0L, + max = Long.MaxValue + ) + object TopicTweetsSemanticCoreVersionIdsSet + extends FSParam[Set[Long]]( + name = "topic_tweets_semantic_core_annotation_version_id_allowed_set", + default = Set(TopicTweetsSemanticCoreVersionId.default)) + + /** + * Controls the Topic Social Proof cosine similarity threshold for the Topic Tweets. + */ + object TweetToTopicCosineSimilarityThreshold + extends FSBoundedParam[Double]( + name = "topic_tweets_cosine_similarity_threshold_tsp", + default = 0.0, + min = 0.0, + max = 1.0 + ) + + object EnablePersonalizedContextTopics // master feature switch to enable backfill + extends FSParam[Boolean]( + name = "topic_tweets_personalized_contexts_enable_personalized_contexts", + default = false + ) + + object EnableYouMightLikeTopic + extends FSParam[Boolean]( + name = "topic_tweets_personalized_contexts_enable_you_might_like", + default = false + ) + + object EnableRecentEngagementsTopic + extends FSParam[Boolean]( + name = "topic_tweets_personalized_contexts_enable_recent_engagements", + default = false + ) + + object EnableTopicTweetHealthFilterPersonalizedContexts + extends FSParam[Boolean]( + name = "topic_tweets_personalized_contexts_health_switch", + default = true + ) + + object EnableTweetToTopicScoreRanking + extends FSParam[Boolean]( + name = "topic_tweets_enable_tweet_to_topic_score_ranking", + default = true + ) + +} + +object FeatureSwitchConfig { + private val enumFeatureSwitchOverrides = FeatureSwitchOverrideUtil + .getEnumFSOverrides( + NullStatsReceiver, + Logger(getClass), + ) + + private val intFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedIntFSOverrides() + + private val longFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedLongFSOverrides( + TopicSocialProofParams.TopicTweetsSemanticCoreVersionId + ) + + private val doubleFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedDoubleFSOverrides( + TopicSocialProofParams.TweetToTopicCosineSimilarityThreshold, + ) + + private val longSetFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getLongSetFSOverrides( + TopicSocialProofParams.TopicTweetsSemanticCoreVersionIdsSet, + ) + + private val booleanFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBooleanFSOverrides( + TopicSocialProofParams.EnablePersonalizedContextTopics, + TopicSocialProofParams.EnableYouMightLikeTopic, + TopicSocialProofParams.EnableRecentEngagementsTopic, + TopicSocialProofParams.EnableTopicTweetHealthFilterPersonalizedContexts, + TopicSocialProofParams.EnableTweetToTopicScoreRanking, + ) + val config: BaseConfig = BaseConfigBuilder() + .set(enumFeatureSwitchOverrides: _*) + .set(intFeatureSwitchOverrides: _*) + .set(longFeatureSwitchOverrides: _*) + .set(doubleFeatureSwitchOverrides: _*) + .set(longSetFeatureSwitchOverrides: _*) + .set(booleanFeatureSwitchOverrides: _*) + .build() +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/BUILD new file mode 100644 index 000000000..dc280e03d --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "stitch/stitch-storehaus", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/common", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/stores", + "topic-social-proof/server/src/main/thrift:thrift-scala", + "topiclisting/topiclisting-core/src/main/scala/com/twitter/topiclisting", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/TopicSocialProofHandler.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/TopicSocialProofHandler.scala new file mode 100644 index 000000000..848ec1d72 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/TopicSocialProofHandler.scala @@ -0,0 +1,587 @@ +package com.twitter.tsp.handlers + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.util.StatsUtil +import com.twitter.simclusters_v2.common.SemanticCoreEntityId +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.strato.response.Err +import com.twitter.storehaus.ReadableStore +import com.twitter.timelines.configapi.Params +import com.twitter.topic_recos.common.Configs.ConsumerTopicEmbeddingType +import com.twitter.topic_recos.common.Configs.DefaultModelVersion +import com.twitter.topic_recos.common.Configs.ProducerTopicEmbeddingType +import com.twitter.topic_recos.common.Configs.TweetEmbeddingType +import com.twitter.topiclisting.TopicListingViewerContext +import com.twitter.topic_recos.common.LocaleUtil +import com.twitter.topiclisting.AnnotationRuleProvider +import com.twitter.tsp.common.DeciderConstants +import com.twitter.tsp.common.LoadShedder +import com.twitter.tsp.common.RecTargetFactory +import com.twitter.tsp.common.TopicSocialProofDecider +import com.twitter.tsp.common.TopicSocialProofParams +import com.twitter.tsp.stores.TopicSocialProofStore +import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof +import com.twitter.tsp.stores.UttTopicFilterStore +import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey +import com.twitter.tsp.thriftscala.MetricTag +import com.twitter.tsp.thriftscala.TopicFollowType +import com.twitter.tsp.thriftscala.TopicListingSetting +import com.twitter.tsp.thriftscala.TopicSocialProofRequest +import com.twitter.tsp.thriftscala.TopicSocialProofResponse +import com.twitter.tsp.thriftscala.TopicWithScore +import com.twitter.tsp.thriftscala.TspTweetInfo +import com.twitter.tsp.utils.HealthSignalsUtils +import com.twitter.util.Future +import com.twitter.util.Timer +import com.twitter.util.Duration +import com.twitter.util.TimeoutException + +import scala.util.Random + +class TopicSocialProofHandler( + topicSocialProofStore: ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]], + tweetInfoStore: ReadableStore[TweetId, TspTweetInfo], + uttTopicFilterStore: UttTopicFilterStore, + recTargetFactory: RecTargetFactory, + decider: TopicSocialProofDecider, + statsReceiver: StatsReceiver, + loadShedder: LoadShedder, + timer: Timer) { + + import TopicSocialProofHandler._ + + def getTopicSocialProofResponse( + request: TopicSocialProofRequest + ): Future[TopicSocialProofResponse] = { + val scopedStats = statsReceiver.scope(request.displayLocation.toString) + scopedStats.counter("fanoutRequests").incr(request.tweetIds.size) + scopedStats.stat("numTweetsPerRequest").add(request.tweetIds.size) + StatsUtil.trackBlockStats(scopedStats) { + recTargetFactory + .buildRecTopicSocialProofTarget(request).flatMap { target => + val enableCosineSimilarityScoreCalculation = + decider.isAvailable(DeciderConstants.enableTopicSocialProofScore) + + val semanticCoreVersionId = + target.params(TopicSocialProofParams.TopicTweetsSemanticCoreVersionId) + + val semanticCoreVersionIdsSet = + target.params(TopicSocialProofParams.TopicTweetsSemanticCoreVersionIdsSet) + + val allowListWithTopicFollowTypeFut = uttTopicFilterStore + .getAllowListTopicsForUser( + request.userId, + request.topicListingSetting, + TopicListingViewerContext + .fromThrift(request.context).copy(languageCode = + LocaleUtil.getStandardLanguageCode(request.context.languageCode)), + request.bypassModes.map(_.toSet) + ).rescue { + case _ => + scopedStats.counter("uttTopicFilterStoreFailure").incr() + Future.value(Map.empty[SemanticCoreEntityId, Option[TopicFollowType]]) + } + + val tweetInfoMapFut: Future[Map[TweetId, Option[TspTweetInfo]]] = Future + .collect( + tweetInfoStore.multiGet(request.tweetIds.toSet) + ).raiseWithin(TweetInfoStoreTimeout)(timer).rescue { + case _: TimeoutException => + scopedStats.counter("tweetInfoStoreTimeout").incr() + Future.value(Map.empty[TweetId, Option[TspTweetInfo]]) + case _ => + scopedStats.counter("tweetInfoStoreFailure").incr() + Future.value(Map.empty[TweetId, Option[TspTweetInfo]]) + } + + val definedTweetInfoMapFut = + keepTweetsWithTweetInfoAndLanguage(tweetInfoMapFut, request.displayLocation.toString) + + Future + .join(definedTweetInfoMapFut, allowListWithTopicFollowTypeFut).map { + case (tweetInfoMap, allowListWithTopicFollowType) => + val tweetIdsToQuery = tweetInfoMap.keys.toSet + val topicProofQueries = + tweetIdsToQuery.map { tweetId => + TopicSocialProofStore.Query( + TopicSocialProofStore.CacheableQuery( + tweetId = tweetId, + tweetLanguage = LocaleUtil.getSupportedStandardLanguageCodeWithDefault( + tweetInfoMap.getOrElse(tweetId, None).flatMap { + _.language + }), + enableCosineSimilarityScoreCalculation = + enableCosineSimilarityScoreCalculation + ), + allowedSemanticCoreVersionIds = semanticCoreVersionIdsSet + ) + } + + val topicSocialProofsFut: Future[Map[TweetId, Seq[TopicSocialProof]]] = { + Future + .collect(topicSocialProofStore.multiGet(topicProofQueries)).map(_.map { + case (query, results) => + query.cacheableQuery.tweetId -> results.toSeq.flatten.filter( + _.semanticCoreVersionId == semanticCoreVersionId) + }) + }.raiseWithin(TopicSocialProofStoreTimeout)(timer).rescue { + case _: TimeoutException => + scopedStats.counter("topicSocialProofStoreTimeout").incr() + Future(Map.empty[TweetId, Seq[TopicSocialProof]]) + case _ => + scopedStats.counter("topicSocialProofStoreFailure").incr() + Future(Map.empty[TweetId, Seq[TopicSocialProof]]) + } + + val random = new Random(seed = request.userId.toInt) + + topicSocialProofsFut.map { topicSocialProofs => + val filteredTopicSocialProofs = filterByAllowedList( + topicSocialProofs, + request.topicListingSetting, + allowListWithTopicFollowType.keySet + ) + + val filteredTopicSocialProofsEmptyCount: Int = + filteredTopicSocialProofs.count { + case (_, topicSocialProofs: Seq[TopicSocialProof]) => + topicSocialProofs.isEmpty + } + + scopedStats + .counter("filteredTopicSocialProofsCount").incr(filteredTopicSocialProofs.size) + scopedStats + .counter("filteredTopicSocialProofsEmptyCount").incr( + filteredTopicSocialProofsEmptyCount) + + if (isCrTopicTweets(request)) { + val socialProofs = filteredTopicSocialProofs.mapValues(_.flatMap { topicProof => + val topicWithScores = buildTopicWithRandomScore( + topicProof, + allowListWithTopicFollowType, + random + ) + topicWithScores + }) + TopicSocialProofResponse(socialProofs) + } else { + val socialProofs = filteredTopicSocialProofs.mapValues(_.flatMap { topicProof => + getTopicProofScore( + topicProof = topicProof, + allowListWithTopicFollowType = allowListWithTopicFollowType, + params = target.params, + random = random, + statsReceiver = statsReceiver + ) + + }.sortBy(-_.score).take(MaxCandidates)) + + val personalizedContextSocialProofs = + if (target.params(TopicSocialProofParams.EnablePersonalizedContextTopics)) { + val personalizedContextEligibility = + checkPersonalizedContextsEligibility( + target.params, + allowListWithTopicFollowType) + val filteredTweets = + filterPersonalizedContexts(socialProofs, tweetInfoMap, target.params) + backfillPersonalizedContexts( + allowListWithTopicFollowType, + filteredTweets, + request.tags.getOrElse(Map.empty), + personalizedContextEligibility) + } else { + Map.empty[TweetId, Seq[TopicWithScore]] + } + + val mergedSocialProofs = socialProofs.map { + case (tweetId, proofs) => + ( + tweetId, + proofs + ++ personalizedContextSocialProofs.getOrElse(tweetId, Seq.empty)) + } + + // Note that we will NOT filter out tweets with no TSP in either case + TopicSocialProofResponse(mergedSocialProofs) + } + } + } + }.flatten.raiseWithin(Timeout)(timer).rescue { + case _: ClientDiscardedRequestException => + scopedStats.counter("ClientDiscardedRequestException").incr() + Future.value(DefaultResponse) + case err: Err if err.code == Err.Cancelled => + scopedStats.counter("CancelledErr").incr() + Future.value(DefaultResponse) + case _ => + scopedStats.counter("FailedRequests").incr() + Future.value(DefaultResponse) + } + } + } + + /** + * Fetch the Score for each Topic Social Proof + */ + private def getTopicProofScore( + topicProof: TopicSocialProof, + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]], + params: Params, + random: Random, + statsReceiver: StatsReceiver + ): Option[TopicWithScore] = { + val scopedStats = statsReceiver.scope("getTopicProofScores") + val enableTweetToTopicScoreRanking = + params(TopicSocialProofParams.EnableTweetToTopicScoreRanking) + + val minTweetToTopicCosineSimilarityThreshold = + params(TopicSocialProofParams.TweetToTopicCosineSimilarityThreshold) + + val topicWithScore = + if (enableTweetToTopicScoreRanking) { + scopedStats.counter("enableTweetToTopicScoreRanking").incr() + buildTopicWithValidScore( + topicProof, + TweetEmbeddingType, + Some(ConsumerTopicEmbeddingType), + Some(ProducerTopicEmbeddingType), + allowListWithTopicFollowType, + DefaultModelVersion, + minTweetToTopicCosineSimilarityThreshold + ) + } else { + scopedStats.counter("buildTopicWithRandomScore").incr() + buildTopicWithRandomScore( + topicProof, + allowListWithTopicFollowType, + random + ) + } + topicWithScore + + } + + private[handlers] def isCrTopicTweets( + request: TopicSocialProofRequest + ): Boolean = { + // CrTopic (across a variety of DisplayLocations) is the only use case with TopicListingSetting.All + request.topicListingSetting == TopicListingSetting.All + } + + /** + * Consolidate logics relevant to whether only quality topics should be enabled for Implicit Follows + */ + + /*** + * Consolidate logics relevant to whether Personalized Contexts backfilling should be enabled + */ + private[handlers] def checkPersonalizedContextsEligibility( + params: Params, + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]] + ): PersonalizedContextEligibility = { + val scopedStats = statsReceiver.scope("checkPersonalizedContextsEligibility") + val isRecentFavInAllowlist = allowListWithTopicFollowType + .contains(AnnotationRuleProvider.recentFavTopicId) + + val isRecentFavEligible = + isRecentFavInAllowlist && params(TopicSocialProofParams.EnableRecentEngagementsTopic) + if (isRecentFavEligible) + scopedStats.counter("isRecentFavEligible").incr() + + val isRecentRetweetInAllowlist = allowListWithTopicFollowType + .contains(AnnotationRuleProvider.recentRetweetTopicId) + + val isRecentRetweetEligible = + isRecentRetweetInAllowlist && params(TopicSocialProofParams.EnableRecentEngagementsTopic) + if (isRecentRetweetEligible) + scopedStats.counter("isRecentRetweetEligible").incr() + + val isYMLInAllowlist = allowListWithTopicFollowType + .contains(AnnotationRuleProvider.youMightLikeTopicId) + + val isYMLEligible = + isYMLInAllowlist && params(TopicSocialProofParams.EnableYouMightLikeTopic) + if (isYMLEligible) + scopedStats.counter("isYMLEligible").incr() + + PersonalizedContextEligibility(isRecentFavEligible, isRecentRetweetEligible, isYMLEligible) + } + + private[handlers] def filterPersonalizedContexts( + socialProofs: Map[TweetId, Seq[TopicWithScore]], + tweetInfoMap: Map[TweetId, Option[TspTweetInfo]], + params: Params + ): Map[TweetId, Seq[TopicWithScore]] = { + val filters: Seq[(Option[TspTweetInfo], Params) => Boolean] = Seq( + healthSignalsFilter, + tweetLanguageFilter + ) + applyFilters(socialProofs, tweetInfoMap, params, filters) + } + + /** * + * filter tweets with None tweetInfo and undefined language + */ + private def keepTweetsWithTweetInfoAndLanguage( + tweetInfoMapFut: Future[Map[TweetId, Option[TspTweetInfo]]], + displayLocation: String + ): Future[Map[TweetId, Option[TspTweetInfo]]] = { + val scopedStats = statsReceiver.scope(displayLocation) + tweetInfoMapFut.map { tweetInfoMap => + val filteredTweetInfoMap = tweetInfoMap.filter { + case (_, optTweetInfo: Option[TspTweetInfo]) => + if (optTweetInfo.isEmpty) { + scopedStats.counter("undefinedTweetInfoCount").incr() + } + + optTweetInfo.exists { tweetInfo: TspTweetInfo => + { + if (tweetInfo.language.isEmpty) { + scopedStats.counter("undefinedLanguageCount").incr() + } + tweetInfo.language.isDefined + } + } + + } + val undefinedTweetInfoOrLangCount = tweetInfoMap.size - filteredTweetInfoMap.size + scopedStats.counter("undefinedTweetInfoOrLangCount").incr(undefinedTweetInfoOrLangCount) + + scopedStats.counter("TweetInfoCount").incr(tweetInfoMap.size) + + filteredTweetInfoMap + } + } + + /*** + * filter tweets with NO evergreen topic social proofs by their health signal scores & tweet languages + * i.e., tweets that are possible to be converted into Personalized Context topic tweets + * TBD: whether we are going to apply filters to all topic tweet candidates + */ + private def applyFilters( + socialProofs: Map[TweetId, Seq[TopicWithScore]], + tweetInfoMap: Map[TweetId, Option[TspTweetInfo]], + params: Params, + filters: Seq[(Option[TspTweetInfo], Params) => Boolean] + ): Map[TweetId, Seq[TopicWithScore]] = { + socialProofs.collect { + case (tweetId, socialProofs) if socialProofs.nonEmpty || filters.forall { filter => + filter(tweetInfoMap.getOrElse(tweetId, None), params) + } => + tweetId -> socialProofs + } + } + + private def healthSignalsFilter( + tweetInfoOpt: Option[TspTweetInfo], + params: Params + ): Boolean = { + !params( + TopicSocialProofParams.EnableTopicTweetHealthFilterPersonalizedContexts) || HealthSignalsUtils + .isHealthyTweet(tweetInfoOpt) + } + + private def tweetLanguageFilter( + tweetInfoOpt: Option[TspTweetInfo], + params: Params + ): Boolean = { + PersonalizedContextTopicsAllowedLanguageSet + .contains(tweetInfoOpt.flatMap(_.language).getOrElse(LocaleUtil.DefaultLanguage)) + } + + private[handlers] def backfillPersonalizedContexts( + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]], + socialProofs: Map[TweetId, Seq[TopicWithScore]], + metricTagsMap: scala.collection.Map[TweetId, scala.collection.Set[MetricTag]], + personalizedContextEligibility: PersonalizedContextEligibility + ): Map[TweetId, Seq[TopicWithScore]] = { + val scopedStats = statsReceiver.scope("backfillPersonalizedContexts") + socialProofs.map { + case (tweetId, topicWithScores) => + if (topicWithScores.nonEmpty) { + tweetId -> Seq.empty + } else { + val metricTagContainsTweetFav = metricTagsMap + .getOrElse(tweetId, Set.empty[MetricTag]).contains(MetricTag.TweetFavorite) + val backfillRecentFav = + personalizedContextEligibility.isRecentFavEligible && metricTagContainsTweetFav + if (metricTagContainsTweetFav) + scopedStats.counter("MetricTag.TweetFavorite").incr() + if (backfillRecentFav) + scopedStats.counter("backfillRecentFav").incr() + + val metricTagContainsRetweet = metricTagsMap + .getOrElse(tweetId, Set.empty[MetricTag]).contains(MetricTag.Retweet) + val backfillRecentRetweet = + personalizedContextEligibility.isRecentRetweetEligible && metricTagContainsRetweet + if (metricTagContainsRetweet) + scopedStats.counter("MetricTag.Retweet").incr() + if (backfillRecentRetweet) + scopedStats.counter("backfillRecentRetweet").incr() + + val metricTagContainsRecentSearches = metricTagsMap + .getOrElse(tweetId, Set.empty[MetricTag]).contains( + MetricTag.InterestsRankerRecentSearches) + + val backfillYML = personalizedContextEligibility.isYMLEligible + if (backfillYML) + scopedStats.counter("backfillYML").incr() + + tweetId -> buildBackfillTopics( + allowListWithTopicFollowType, + backfillRecentFav, + backfillRecentRetweet, + backfillYML) + } + } + } + + private def buildBackfillTopics( + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]], + backfillRecentFav: Boolean, + backfillRecentRetweet: Boolean, + backfillYML: Boolean + ): Seq[TopicWithScore] = { + Seq( + if (backfillRecentFav) { + Some( + TopicWithScore( + topicId = AnnotationRuleProvider.recentFavTopicId, + score = 1.0, + topicFollowType = allowListWithTopicFollowType + .getOrElse(AnnotationRuleProvider.recentFavTopicId, None) + )) + } else { None }, + if (backfillRecentRetweet) { + Some( + TopicWithScore( + topicId = AnnotationRuleProvider.recentRetweetTopicId, + score = 1.0, + topicFollowType = allowListWithTopicFollowType + .getOrElse(AnnotationRuleProvider.recentRetweetTopicId, None) + )) + } else { None }, + if (backfillYML) { + Some( + TopicWithScore( + topicId = AnnotationRuleProvider.youMightLikeTopicId, + score = 1.0, + topicFollowType = allowListWithTopicFollowType + .getOrElse(AnnotationRuleProvider.youMightLikeTopicId, None) + )) + } else { None } + ).flatten + } + + def toReadableStore: ReadableStore[TopicSocialProofRequest, TopicSocialProofResponse] = { + new ReadableStore[TopicSocialProofRequest, TopicSocialProofResponse] { + override def get(k: TopicSocialProofRequest): Future[Option[TopicSocialProofResponse]] = { + val displayLocation = k.displayLocation.toString + loadShedder(displayLocation) { + getTopicSocialProofResponse(k).map(Some(_)) + }.rescue { + case LoadShedder.LoadSheddingException => + statsReceiver.scope(displayLocation).counter("LoadSheddingException").incr() + Future.None + case _ => + statsReceiver.scope(displayLocation).counter("Exception").incr() + Future.None + } + } + } + } +} + +object TopicSocialProofHandler { + + private val MaxCandidates = 10 + // Currently we do hardcode for the language check of PersonalizedContexts Topics + private val PersonalizedContextTopicsAllowedLanguageSet: Set[String] = + Set("pt", "ko", "es", "ja", "tr", "id", "en", "hi", "ar", "fr", "ru") + + private val Timeout: Duration = 200.milliseconds + private val TopicSocialProofStoreTimeout: Duration = 40.milliseconds + private val TweetInfoStoreTimeout: Duration = 60.milliseconds + private val DefaultResponse: TopicSocialProofResponse = TopicSocialProofResponse(Map.empty) + + case class PersonalizedContextEligibility( + isRecentFavEligible: Boolean, + isRecentRetweetEligible: Boolean, + isYMLEligible: Boolean) + + /** + * Calculate the Topic Scores for each (tweet, topic), filter out topic proofs whose scores do not + * pass the minimum threshold + */ + private[handlers] def buildTopicWithValidScore( + topicProof: TopicSocialProof, + tweetEmbeddingType: EmbeddingType, + maybeConsumerEmbeddingType: Option[EmbeddingType], + maybeProducerEmbeddingType: Option[EmbeddingType], + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]], + simClustersModelVersion: ModelVersion, + minTweetToTopicCosineSimilarityThreshold: Double + ): Option[TopicWithScore] = { + + val consumerScore = maybeConsumerEmbeddingType + .flatMap { consumerEmbeddingType => + topicProof.scores.get( + ScoreKey(consumerEmbeddingType, tweetEmbeddingType, simClustersModelVersion)) + }.getOrElse(0.0) + + val producerScore = maybeProducerEmbeddingType + .flatMap { producerEmbeddingType => + topicProof.scores.get( + ScoreKey(producerEmbeddingType, tweetEmbeddingType, simClustersModelVersion)) + }.getOrElse(0.0) + + val combinedScore = consumerScore + producerScore + if (combinedScore > minTweetToTopicCosineSimilarityThreshold || topicProof.ignoreSimClusterFiltering) { + Some( + TopicWithScore( + topicId = topicProof.topicId.entityId, + score = combinedScore, + topicFollowType = + allowListWithTopicFollowType.getOrElse(topicProof.topicId.entityId, None))) + } else { + None + } + } + + private[handlers] def buildTopicWithRandomScore( + topicSocialProof: TopicSocialProof, + allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]], + random: Random + ): Option[TopicWithScore] = { + + Some( + TopicWithScore( + topicId = topicSocialProof.topicId.entityId, + score = random.nextDouble(), + topicFollowType = + allowListWithTopicFollowType.getOrElse(topicSocialProof.topicId.entityId, None) + )) + } + + /** + * Filter all the non-qualified Topic Social Proof + */ + private[handlers] def filterByAllowedList( + topicProofs: Map[TweetId, Seq[TopicSocialProof]], + setting: TopicListingSetting, + allowList: Set[SemanticCoreEntityId] + ): Map[TweetId, Seq[TopicSocialProof]] = { + setting match { + case TopicListingSetting.All => + // Return all the topics + topicProofs + case _ => + topicProofs.mapValues( + _.filter(topicProof => allowList.contains(topicProof.topicId.entityId))) + } + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/UttChildrenWarmupHandler.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/UttChildrenWarmupHandler.scala new file mode 100644 index 000000000..b431685c8 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers/UttChildrenWarmupHandler.scala @@ -0,0 +1,40 @@ +package com.twitter.tsp.handlers + +import com.twitter.inject.utils.Handler +import com.twitter.topiclisting.FollowableTopicProductId +import com.twitter.topiclisting.ProductId +import com.twitter.topiclisting.TopicListingViewerContext +import com.twitter.topiclisting.utt.UttLocalization +import com.twitter.util.logging.Logging +import javax.inject.Inject +import javax.inject.Singleton + +/** * + * We configure Warmer to help warm up the cache hit rate under `CachedUttClient/get_utt_taxonomy/cache_hit_rate` + * In uttLocalization.getRecommendableTopics, we fetch all topics exist in UTT, and yet the process + * is in fact fetching the complete UTT tree struct (by calling getUttChildren recursively), which could take 1 sec + * Once we have the topics, we stored them in in-memory cache, and the cache hit rate is > 99% + * + */ +@Singleton +class UttChildrenWarmupHandler @Inject() (uttLocalization: UttLocalization) + extends Handler + with Logging { + + /** Executes the function of this handler. * */ + override def handle(): Unit = { + uttLocalization + .getRecommendableTopics( + productId = ProductId.Followable, + viewerContext = TopicListingViewerContext(languageCode = Some("en")), + enableInternationalTopics = true, + followableTopicProductId = FollowableTopicProductId.AllFollowable + ) + .onSuccess { result => + logger.info(s"successfully warmed up UttChildren. TopicId length = ${result.size}") + } + .onFailure { throwable => + logger.info(s"failed to warm up UttChildren. Throwable = ${throwable}") + } + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/BUILD new file mode 100644 index 000000000..d68c9ad23 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/BUILD @@ -0,0 +1,30 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/com/twitter/storehaus:memcache", + "escherbird/src/scala/com/twitter/escherbird/util/uttclient", + "escherbird/src/thrift/com/twitter/escherbird/utt:strato-columns-scala", + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-thrift-client", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/util", + "src/thrift/com/twitter/gizmoduck:thrift-scala", + "src/thrift/com/twitter/gizmoduck:user-thrift-scala", + "stitch/stitch-storehaus", + "stitch/stitch-tweetypie/src/main/scala", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/common", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/stores", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/utils", + "topic-social-proof/server/src/main/thrift:thrift-scala", + "topiclisting/common/src/main/scala/com/twitter/topiclisting/clients", + "topiclisting/topiclisting-utt/src/main/scala/com/twitter/topiclisting/utt", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/GizmoduckUserModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/GizmoduckUserModule.scala new file mode 100644 index 000000000..a700d9fef --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/GizmoduckUserModule.scala @@ -0,0 +1,35 @@ +package com.twitter.tsp.modules + +import com.google.inject.Module +import com.twitter.finagle.ThriftMux +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.mtls.client.MtlsStackClient._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.thrift.ClientId +import com.twitter.finatra.mtls.thriftmux.modules.MtlsClient +import com.twitter.gizmoduck.thriftscala.UserService +import com.twitter.inject.Injector +import com.twitter.inject.thrift.modules.ThriftMethodBuilderClientModule + +object GizmoduckUserModule + extends ThriftMethodBuilderClientModule[ + UserService.ServicePerEndpoint, + UserService.MethodPerEndpoint + ] + with MtlsClient { + + override val label: String = "gizmoduck" + override val dest: String = "/s/gizmoduck/gizmoduck" + override val modules: Seq[Module] = Seq(TSPClientIdModule) + + override def configureThriftMuxClient( + injector: Injector, + client: ThriftMux.Client + ): ThriftMux.Client = { + super + .configureThriftMuxClient(injector, client) + .withMutualTls(injector.instance[ServiceIdentifier]) + .withClientId(injector.instance[ClientId]) + .withStatsReceiver(injector.instance[StatsReceiver].scope("giz")) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/RepresentationScorerStoreModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/RepresentationScorerStoreModule.scala new file mode 100644 index 000000000..329276d8d --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/RepresentationScorerStoreModule.scala @@ -0,0 +1,47 @@ +package com.twitter.tsp.modules + +import com.google.inject.Module +import com.google.inject.Provides +import com.google.inject.Singleton +import com.twitter.app.Flag +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.{Client => MemClient} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.inject.TwitterModule +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.tsp.stores.RepresentationScorerStore + +object RepresentationScorerStoreModule extends TwitterModule { + override def modules: Seq[Module] = Seq(UnifiedCacheClient) + + private val tspRepresentationScoringColumnPath: Flag[String] = flag[String]( + name = "tsp.representationScoringColumnPath", + default = "recommendations/representation_scorer/score", + help = "Strato column path for Representation Scorer Store" + ) + + @Provides + @Singleton + def providesRepresentationScorerStore( + statsReceiver: StatsReceiver, + stratoClient: StratoClient, + tspUnifiedCacheClient: MemClient + ): ReadableStore[ScoreId, Score] = { + val underlyingStore = + RepresentationScorerStore(stratoClient, tspRepresentationScoringColumnPath(), statsReceiver) + ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlyingStore, + cacheClient = tspUnifiedCacheClient, + ttl = 2.hours + )( + valueInjection = BinaryScalaCodec(Score), + statsReceiver = statsReceiver.scope("RepresentationScorerStore"), + keyToString = { k: ScoreId => s"rsx/$k" } + ) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TSPClientIdModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TSPClientIdModule.scala new file mode 100644 index 000000000..d22ef500f --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TSPClientIdModule.scala @@ -0,0 +1,14 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.twitter.finagle.thrift.ClientId +import com.twitter.inject.TwitterModule +import javax.inject.Singleton + +object TSPClientIdModule extends TwitterModule { + private val clientIdFlag = flag("thrift.clientId", "topic-social-proof.prod", "Thrift client id") + + @Provides + @Singleton + def providesClientId: ClientId = ClientId(clientIdFlag()) +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicListingModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicListingModule.scala new file mode 100644 index 000000000..3f2768278 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicListingModule.scala @@ -0,0 +1,17 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.topiclisting.TopicListing +import com.twitter.topiclisting.TopicListingBuilder +import javax.inject.Singleton + +object TopicListingModule extends TwitterModule { + + @Provides + @Singleton + def providesTopicListing(statsReceiver: StatsReceiver): TopicListing = { + new TopicListingBuilder(statsReceiver.scope(namespace = "TopicListingBuilder")).build + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicSocialProofStoreModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicSocialProofStoreModule.scala new file mode 100644 index 000000000..fe63b0e21 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicSocialProofStoreModule.scala @@ -0,0 +1,68 @@ +package com.twitter.tsp.modules + +import com.google.inject.Module +import com.google.inject.Provides +import com.google.inject.Singleton +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.{Client => MemClient} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.inject.TwitterModule +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.tsp.stores.SemanticCoreAnnotationStore +import com.twitter.tsp.stores.TopicSocialProofStore +import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof +import com.twitter.tsp.utils.LZ4Injection +import com.twitter.tsp.utils.SeqObjectInjection + +object TopicSocialProofStoreModule extends TwitterModule { + override def modules: Seq[Module] = Seq(UnifiedCacheClient) + + @Provides + @Singleton + def providesTopicSocialProofStore( + representationScorerStore: ReadableStore[ScoreId, Score], + statsReceiver: StatsReceiver, + stratoClient: StratoClient, + tspUnifiedCacheClient: MemClient, + ): ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]] = { + val semanticCoreAnnotationStore: ReadableStore[TweetId, Seq[ + SemanticCoreAnnotationStore.TopicAnnotation + ]] = ObservedReadableStore( + SemanticCoreAnnotationStore(SemanticCoreAnnotationStore.getStratoStore(stratoClient)) + )(statsReceiver.scope("SemanticCoreAnnotationStore")) + + val underlyingStore = TopicSocialProofStore( + representationScorerStore, + semanticCoreAnnotationStore + )(statsReceiver.scope("TopicSocialProofStore")) + + val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlyingStore, + cacheClient = tspUnifiedCacheClient, + ttl = 15.minutes, + asyncUpdate = true + )( + valueInjection = LZ4Injection.compose(SeqObjectInjection[TopicSocialProof]()), + statsReceiver = statsReceiver.scope("memCachedTopicSocialProofStore"), + keyToString = { k: TopicSocialProofStore.Query => s"tsps/${k.cacheableQuery}" } + ) + + val inMemoryCachedStore = + ObservedCachedReadableStore.from[TopicSocialProofStore.Query, Seq[TopicSocialProof]]( + memcachedStore, + ttl = 10.minutes, + maxKeys = 16777215, // ~ avg 160B, < 3000MB + cacheName = "topic_social_proof_cache", + windowSize = 10000L + )(statsReceiver.scope("InMemoryCachedTopicSocialProofStore")) + + inMemoryCachedStore + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicTweetCosineSimilarityAggregateStoreModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicTweetCosineSimilarityAggregateStoreModule.scala new file mode 100644 index 000000000..ac15b3746 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TopicTweetCosineSimilarityAggregateStoreModule.scala @@ -0,0 +1,26 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.google.inject.Singleton +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.storehaus.ReadableStore +import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore +import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey + +object TopicTweetCosineSimilarityAggregateStoreModule extends TwitterModule { + + @Provides + @Singleton + def providesTopicTweetCosineSimilarityAggregateStore( + representationScorerStore: ReadableStore[ScoreId, Score], + statsReceiver: StatsReceiver, + ): ReadableStore[(TopicId, TweetId, Seq[ScoreKey]), Map[ScoreKey, Double]] = { + TopicTweetsCosineSimilarityAggregateStore(representationScorerStore)( + statsReceiver.scope("topicTweetsCosineSimilarityAggregateStore")) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetInfoStoreModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetInfoStoreModule.scala new file mode 100644 index 000000000..1e08a9209 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetInfoStoreModule.scala @@ -0,0 +1,130 @@ +package com.twitter.tsp.modules + +import com.google.inject.Module +import com.google.inject.Provides +import com.google.inject.Singleton +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.{Client => MemClient} +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.health.TweetHealthModelStore +import com.twitter.frigate.common.store.health.TweetHealthModelStore.TweetHealthModelStoreConfig +import com.twitter.frigate.common.store.health.UserHealthModelStore +import com.twitter.frigate.common.store.interests.UserId +import com.twitter.frigate.thriftscala.TweetHealthScores +import com.twitter.frigate.thriftscala.UserAgathaScores +import com.twitter.hermit.store.common.DeciderableReadableStore +import com.twitter.hermit.store.common.ObservedCachedReadableStore +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.inject.TwitterModule +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.stitch.tweetypie.TweetyPie +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.tsp.common.DeciderKey +import com.twitter.tsp.common.TopicSocialProofDecider +import com.twitter.tsp.stores.TweetInfoStore +import com.twitter.tsp.stores.TweetyPieFieldsStore +import com.twitter.tweetypie.thriftscala.TweetService +import com.twitter.tsp.thriftscala.TspTweetInfo +import com.twitter.util.JavaTimer +import com.twitter.util.Timer + +object TweetInfoStoreModule extends TwitterModule { + override def modules: Seq[Module] = Seq(UnifiedCacheClient) + implicit val timer: Timer = new JavaTimer(true) + + @Provides + @Singleton + def providesTweetInfoStore( + decider: TopicSocialProofDecider, + serviceIdentifier: ServiceIdentifier, + statsReceiver: StatsReceiver, + stratoClient: StratoClient, + tspUnifiedCacheClient: MemClient, + tweetyPieService: TweetService.MethodPerEndpoint + ): ReadableStore[TweetId, TspTweetInfo] = { + val tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores] = { + val underlyingStore = TweetHealthModelStore.buildReadableStore( + stratoClient, + Some( + TweetHealthModelStoreConfig( + enablePBlock = true, + enableToxicity = true, + enablePSpammy = true, + enablePReported = true, + enableSpammyTweetContent = true, + enablePNegMultimodal = false)) + )(statsReceiver.scope("UnderlyingTweetHealthModelStore")) + + DeciderableReadableStore( + ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlyingStore, + cacheClient = tspUnifiedCacheClient, + ttl = 2.hours + )( + valueInjection = BinaryScalaCodec(TweetHealthScores), + statsReceiver = statsReceiver.scope("TweetHealthModelStore"), + keyToString = { k: TweetId => s"tHMS/$k" } + ), + decider.deciderGateBuilder.idGate(DeciderKey.enableHealthSignalsScoreDeciderKey), + statsReceiver.scope("TweetHealthModelStore") + ) + } + + val userHealthModelStore: ReadableStore[UserId, UserAgathaScores] = { + val underlyingStore = + UserHealthModelStore.buildReadableStore(stratoClient)( + statsReceiver.scope("UnderlyingUserHealthModelStore")) + + DeciderableReadableStore( + ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlyingStore, + cacheClient = tspUnifiedCacheClient, + ttl = 18.hours + )( + valueInjection = BinaryScalaCodec(UserAgathaScores), + statsReceiver = statsReceiver.scope("UserHealthModelStore"), + keyToString = { k: UserId => s"uHMS/$k" } + ), + decider.deciderGateBuilder.idGate(DeciderKey.enableUserAgathaScoreDeciderKey), + statsReceiver.scope("UserHealthModelStore") + ) + } + + val tweetInfoStore: ReadableStore[TweetId, TspTweetInfo] = { + val underlyingStore = TweetInfoStore( + TweetyPieFieldsStore.getStoreFromTweetyPie(TweetyPie(tweetyPieService, statsReceiver)), + tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores], + userHealthModelStore: ReadableStore[UserId, UserAgathaScores], + timer: Timer + )(statsReceiver.scope("tweetInfoStore")) + + val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient( + backingStore = underlyingStore, + cacheClient = tspUnifiedCacheClient, + ttl = 15.minutes, + // Hydrating tweetInfo is now a required step for all candidates, + // hence we needed to tune these thresholds. + asyncUpdate = serviceIdentifier.environment == "prod" + )( + valueInjection = BinaryScalaCodec(TspTweetInfo), + statsReceiver = statsReceiver.scope("memCachedTweetInfoStore"), + keyToString = { k: TweetId => s"tIS/$k" } + ) + + val inMemoryStore = ObservedCachedReadableStore.from( + memcachedStore, + ttl = 15.minutes, + maxKeys = 8388607, // Check TweetInfo definition. size~92b. Around 736 MB + windowSize = 10000L, + cacheName = "tweet_info_cache", + maxMultiGetSize = 20 + )(statsReceiver.scope("inMemoryCachedTweetInfoStore")) + + inMemoryStore + } + tweetInfoStore + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetyPieClientModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetyPieClientModule.scala new file mode 100644 index 000000000..98d515dda --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/TweetyPieClientModule.scala @@ -0,0 +1,63 @@ +package com.twitter.tsp +package modules + +import com.google.inject.Module +import com.google.inject.Provides +import com.twitter.conversions.DurationOps.richDurationFromInt +import com.twitter.finagle.ThriftMux +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.service.ReqRep +import com.twitter.finagle.service.ResponseClass +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.thrift.ClientId +import com.twitter.inject.Injector +import com.twitter.inject.thrift.modules.ThriftMethodBuilderClientModule +import com.twitter.tweetypie.thriftscala.TweetService +import com.twitter.util.Duration +import com.twitter.util.Throw +import com.twitter.stitch.tweetypie.{TweetyPie => STweetyPie} +import com.twitter.finatra.mtls.thriftmux.modules.MtlsClient +import javax.inject.Singleton + +object TweetyPieClientModule + extends ThriftMethodBuilderClientModule[ + TweetService.ServicePerEndpoint, + TweetService.MethodPerEndpoint + ] + with MtlsClient { + override val label = "tweetypie" + override val dest = "/s/tweetypie/tweetypie" + override val requestTimeout: Duration = 450.milliseconds + + override val modules: Seq[Module] = Seq(TSPClientIdModule) + + // We bump the success rate from the default of 0.8 to 0.9 since we're dropping the + // consecutive failures part of the default policy. + override def configureThriftMuxClient( + injector: Injector, + client: ThriftMux.Client + ): ThriftMux.Client = + super + .configureThriftMuxClient(injector, client) + .withMutualTls(injector.instance[ServiceIdentifier]) + .withStatsReceiver(injector.instance[StatsReceiver].scope("clnt")) + .withClientId(injector.instance[ClientId]) + .withResponseClassifier { + case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable + } + .withSessionQualifier + .successRateFailureAccrual(successRate = 0.9, window = 30.seconds) + .withResponseClassifier { + case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable + } + + @Provides + @Singleton + def providesTweetyPie( + tweetyPieService: TweetService.MethodPerEndpoint + ): STweetyPie = { + STweetyPie(tweetyPieService) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UnifiedCacheClient.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UnifiedCacheClient.scala new file mode 100644 index 000000000..8fe65fc73 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UnifiedCacheClient.scala @@ -0,0 +1,33 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.google.inject.Singleton +import com.twitter.app.Flag +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.storehaus_internal.memcache.MemcacheStore +import com.twitter.storehaus_internal.util.ClientName +import com.twitter.storehaus_internal.util.ZkEndPoint + +object UnifiedCacheClient extends TwitterModule { + val tspUnifiedCacheDest: Flag[String] = flag[String]( + name = "tsp.unifiedCacheDest", + default = "/srv#/prod/local/cache/topic_social_proof_unified", + help = "Wily path to topic social proof unified cache" + ) + + @Provides + @Singleton + def provideUnifiedCacheClient( + serviceIdentifier: ServiceIdentifier, + statsReceiver: StatsReceiver, + ): Client = + MemcacheStore.memcachedClient( + name = ClientName("topic-social-proof-unified-memcache"), + dest = ZkEndPoint(tspUnifiedCacheDest()), + statsReceiver = statsReceiver.scope("cache_client"), + serviceIdentifier = serviceIdentifier + ) +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttClientModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttClientModule.scala new file mode 100644 index 000000000..ae0099b8b --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttClientModule.scala @@ -0,0 +1,41 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.twitter.escherbird.util.uttclient.CacheConfigV2 +import com.twitter.escherbird.util.uttclient.CachedUttClientV2 +import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2 +import com.twitter.escherbird.utt.strato.thriftscala.Environment +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.strato.client.Client +import com.twitter.topiclisting.clients.utt.UttClient +import javax.inject.Singleton + +object UttClientModule extends TwitterModule { + + @Provides + @Singleton + def providesUttClient( + stratoClient: Client, + statsReceiver: StatsReceiver + ): UttClient = { + + // Save 2 ^ 18 UTTs. Promising 100% cache rate + lazy val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143) + lazy val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2( + getTaxonomyConfig = defaultCacheConfigV2, + getUttTaxonomyConfig = defaultCacheConfigV2, + getLeafIds = defaultCacheConfigV2, + getLeafUttEntities = defaultCacheConfigV2 + ) + + // CachedUttClient to use StratoClient + lazy val cachedUttClientV2: CachedUttClientV2 = new CachedUttClientV2( + stratoClient = stratoClient, + env = Environment.Prod, + cacheConfigs = uttClientCacheConfigsV2, + statsReceiver = statsReceiver.scope("CachedUttClient") + ) + new UttClient(cachedUttClientV2, statsReceiver) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttLocalizationModule.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttLocalizationModule.scala new file mode 100644 index 000000000..7d8844b98 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/modules/UttLocalizationModule.scala @@ -0,0 +1,27 @@ +package com.twitter.tsp.modules + +import com.google.inject.Provides +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.topiclisting.TopicListing +import com.twitter.topiclisting.clients.utt.UttClient +import com.twitter.topiclisting.utt.UttLocalization +import com.twitter.topiclisting.utt.UttLocalizationImpl +import javax.inject.Singleton + +object UttLocalizationModule extends TwitterModule { + + @Provides + @Singleton + def providesUttLocalization( + topicListing: TopicListing, + uttClient: UttClient, + statsReceiver: StatsReceiver + ): UttLocalization = { + new UttLocalizationImpl( + topicListing, + uttClient, + statsReceiver + ) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/BUILD new file mode 100644 index 000000000..372962922 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/BUILD @@ -0,0 +1,23 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "3rdparty/jvm/javax/inject:javax.inject", + "abdecider/src/main/scala", + "content-recommender/thrift/src/main/thrift:thrift-scala", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/gizmoduck", + "src/scala/com/twitter/topic_recos/stores", + "src/thrift/com/twitter/gizmoduck:thrift-scala", + "src/thrift/com/twitter/gizmoduck:user-thrift-scala", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "stitch/stitch-storehaus", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/common", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/modules", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/stores", + "topic-social-proof/server/src/main/thrift:thrift-scala", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/TopicSocialProofService.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/TopicSocialProofService.scala new file mode 100644 index 000000000..f123e819f --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/service/TopicSocialProofService.scala @@ -0,0 +1,182 @@ +package com.twitter.tsp.service + +import com.twitter.abdecider.ABDeciderFactory +import com.twitter.abdecider.LoggingABDecider +import com.twitter.tsp.thriftscala.TspTweetInfo +import com.twitter.discovery.common.configapi.FeatureContextBuilder +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.gizmoduck.thriftscala.LookupContext +import com.twitter.gizmoduck.thriftscala.QueryFields +import com.twitter.gizmoduck.thriftscala.User +import com.twitter.gizmoduck.thriftscala.UserService +import com.twitter.hermit.store.gizmoduck.GizmoduckUserStore +import com.twitter.logging.Logger +import com.twitter.simclusters_v2.common.SemanticCoreEntityId +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.common.UserId +import com.twitter.spam.rtf.thriftscala.SafetyLevel +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.{Client => StratoClient} +import com.twitter.timelines.configapi +import com.twitter.timelines.configapi.CompositeConfig +import com.twitter.tsp.common.FeatureSwitchConfig +import com.twitter.tsp.common.FeatureSwitchesBuilder +import com.twitter.tsp.common.LoadShedder +import com.twitter.tsp.common.ParamsBuilder +import com.twitter.tsp.common.RecTargetFactory +import com.twitter.tsp.common.TopicSocialProofDecider +import com.twitter.tsp.handlers.TopicSocialProofHandler +import com.twitter.tsp.stores.LocalizedUttRecommendableTopicsStore +import com.twitter.tsp.stores.LocalizedUttTopicNameRequest +import com.twitter.tsp.stores.TopicResponses +import com.twitter.tsp.stores.TopicSocialProofStore +import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof +import com.twitter.tsp.stores.TopicStore +import com.twitter.tsp.stores.UttTopicFilterStore +import com.twitter.tsp.thriftscala.TopicSocialProofRequest +import com.twitter.tsp.thriftscala.TopicSocialProofResponse +import com.twitter.util.JavaTimer +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton +import com.twitter.topiclisting.TopicListing +import com.twitter.topiclisting.utt.UttLocalization + +@Singleton +class TopicSocialProofService @Inject() ( + topicSocialProofStore: ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]], + tweetInfoStore: ReadableStore[TweetId, TspTweetInfo], + serviceIdentifier: ServiceIdentifier, + stratoClient: StratoClient, + gizmoduck: UserService.MethodPerEndpoint, + topicListing: TopicListing, + uttLocalization: UttLocalization, + decider: TopicSocialProofDecider, + loadShedder: LoadShedder, + stats: StatsReceiver) { + + import TopicSocialProofService._ + + private val statsReceiver = stats.scope("topic-social-proof-management") + + private val isProd: Boolean = serviceIdentifier.environment == "prod" + + private val optOutStratoStorePath: String = + if (isProd) "interests/optOutInterests" else "interests/staging/optOutInterests" + + private val notInterestedInStorePath: String = + if (isProd) "interests/notInterestedTopicsGetter" + else "interests/staging/notInterestedTopicsGetter" + + private val userOptOutTopicsStore: ReadableStore[UserId, TopicResponses] = + TopicStore.userOptOutTopicStore(stratoClient, optOutStratoStorePath)( + statsReceiver.scope("ints_interests_opt_out_store")) + private val explicitFollowingTopicsStore: ReadableStore[UserId, TopicResponses] = + TopicStore.explicitFollowingTopicStore(stratoClient)( + statsReceiver.scope("ints_explicit_following_interests_store")) + private val userNotInterestedInTopicsStore: ReadableStore[UserId, TopicResponses] = + TopicStore.notInterestedInTopicsStore(stratoClient, notInterestedInStorePath)( + statsReceiver.scope("ints_not_interested_in_store")) + + private lazy val localizedUttRecommendableTopicsStore: ReadableStore[ + LocalizedUttTopicNameRequest, + Set[ + SemanticCoreEntityId + ] + ] = new LocalizedUttRecommendableTopicsStore(uttLocalization) + + implicit val timer: Timer = new JavaTimer(true) + + private lazy val uttTopicFilterStore = new UttTopicFilterStore( + topicListing = topicListing, + userOptOutTopicsStore = userOptOutTopicsStore, + explicitFollowingTopicsStore = explicitFollowingTopicsStore, + notInterestedTopicsStore = userNotInterestedInTopicsStore, + localizedUttRecommendableTopicsStore = localizedUttRecommendableTopicsStore, + timer = timer, + stats = statsReceiver.scope("UttTopicFilterStore") + ) + + private lazy val scribeLogger: Option[Logger] = Some(Logger.get("client_event")) + + private lazy val abDecider: LoggingABDecider = + ABDeciderFactory( + abDeciderYmlPath = configRepoDirectory + "/abdecider/abdecider.yml", + scribeLogger = scribeLogger, + decider = None, + environment = Some("production"), + ).buildWithLogging() + + private val builder: FeatureSwitchesBuilder = FeatureSwitchesBuilder( + statsReceiver = statsReceiver.scope("featureswitches-v2"), + abDecider = abDecider, + featuresDirectory = "features/topic-social-proof/main", + configRepoDirectory = configRepoDirectory, + addServiceDetailsFromAurora = !serviceIdentifier.isLocal, + fastRefresh = !isProd + ) + + private lazy val overridesConfig: configapi.Config = { + new CompositeConfig( + Seq( + FeatureSwitchConfig.config + ) + ) + } + + private val featureContextBuilder: FeatureContextBuilder = FeatureContextBuilder(builder.build()) + + private val paramsBuilder: ParamsBuilder = ParamsBuilder( + featureContextBuilder, + abDecider, + overridesConfig, + statsReceiver.scope("params") + ) + + private val userStore: ReadableStore[UserId, User] = { + val queryFields: Set[QueryFields] = Set( + QueryFields.Profile, + QueryFields.Account, + QueryFields.Roles, + QueryFields.Discoverability, + QueryFields.Safety, + QueryFields.Takedowns + ) + val context: LookupContext = LookupContext(safetyLevel = Some(SafetyLevel.Recommendations)) + + GizmoduckUserStore( + client = gizmoduck, + queryFields = queryFields, + context = context, + statsReceiver = statsReceiver.scope("gizmoduck") + ) + } + + private val recTargetFactory: RecTargetFactory = RecTargetFactory( + abDecider, + userStore, + paramsBuilder, + statsReceiver + ) + + private val topicSocialProofHandler = + new TopicSocialProofHandler( + topicSocialProofStore, + tweetInfoStore, + uttTopicFilterStore, + recTargetFactory, + decider, + statsReceiver.scope("TopicSocialProofHandler"), + loadShedder, + timer) + + val topicSocialProofHandlerStoreStitch: TopicSocialProofRequest => com.twitter.stitch.Stitch[ + TopicSocialProofResponse + ] = StitchOfReadableStore(topicSocialProofHandler.toReadableStore) +} + +object TopicSocialProofService { + private val configRepoDirectory = "/usr/local/config" +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/BUILD new file mode 100644 index 000000000..a933b3782 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/BUILD @@ -0,0 +1,32 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "3rdparty/jvm/com/twitter/storehaus:core", + "content-recommender/thrift/src/main/thrift:thrift-scala", + "escherbird/src/thrift/com/twitter/escherbird/topicannotation:topicannotation-thrift-scala", + "frigate/frigate-common:util", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/health", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/interests", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "mediaservices/commons/src/main/thrift:thrift-scala", + "src/scala/com/twitter/simclusters_v2/common", + "src/scala/com/twitter/simclusters_v2/score", + "src/scala/com/twitter/topic_recos/common", + "src/scala/com/twitter/topic_recos/stores", + "src/thrift/com/twitter/frigate:frigate-common-thrift-scala", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/spam/rtf:safety-level-scala", + "src/thrift/com/twitter/tweetypie:service-scala", + "src/thrift/com/twitter/tweetypie:tweet-scala", + "stitch/stitch-storehaus", + "stitch/stitch-tweetypie/src/main/scala", + "strato/src/main/scala/com/twitter/strato/client", + "topic-social-proof/server/src/main/scala/com/twitter/tsp/utils", + "topic-social-proof/server/src/main/thrift:thrift-scala", + "topiclisting/topiclisting-core/src/main/scala/com/twitter/topiclisting", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/LocalizedUttRecommendableTopicsStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/LocalizedUttRecommendableTopicsStore.scala new file mode 100644 index 000000000..bcac9d5f6 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/LocalizedUttRecommendableTopicsStore.scala @@ -0,0 +1,30 @@ +package com.twitter.tsp.stores + +import com.twitter.storehaus.ReadableStore +import com.twitter.topiclisting.FollowableTopicProductId +import com.twitter.topiclisting.ProductId +import com.twitter.topiclisting.SemanticCoreEntityId +import com.twitter.topiclisting.TopicListingViewerContext +import com.twitter.topiclisting.utt.UttLocalization +import com.twitter.util.Future + +case class LocalizedUttTopicNameRequest( + productId: ProductId.Value, + viewerContext: TopicListingViewerContext, + enableInternationalTopics: Boolean) + +class LocalizedUttRecommendableTopicsStore(uttLocalization: UttLocalization) + extends ReadableStore[LocalizedUttTopicNameRequest, Set[SemanticCoreEntityId]] { + + override def get( + request: LocalizedUttTopicNameRequest + ): Future[Option[Set[SemanticCoreEntityId]]] = { + uttLocalization + .getRecommendableTopics( + productId = request.productId, + viewerContext = request.viewerContext, + enableInternationalTopics = request.enableInternationalTopics, + followableTopicProductId = FollowableTopicProductId.AllFollowable + ).map { response => Some(response) } + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/RepresentationScorerStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/RepresentationScorerStore.scala new file mode 100644 index 000000000..7d5095ca6 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/RepresentationScorerStore.scala @@ -0,0 +1,31 @@ +package com.twitter.tsp.stores + +import com.twitter.contentrecommender.thriftscala.ScoringResponse +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.simclusters_v2.thriftscala.Score +import com.twitter.simclusters_v2.thriftscala.ScoreId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.Client +import com.twitter.strato.thrift.ScroogeConvImplicits._ +import com.twitter.tsp.utils.ReadableStoreWithMapOptionValues + +object RepresentationScorerStore { + + def apply( + stratoClient: Client, + scoringColumnPath: String, + stats: StatsReceiver + ): ReadableStore[ScoreId, Score] = { + val stratoFetchableStore = StratoFetchableStore + .withUnitView[ScoreId, ScoringResponse](stratoClient, scoringColumnPath) + + val enrichedStore = new ReadableStoreWithMapOptionValues[ScoreId, ScoringResponse, Score]( + stratoFetchableStore).mapOptionValues(_.score) + + ObservedReadableStore( + enrichedStore + )(stats.scope("representation_scorer_store")) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/SemanticCoreAnnotationStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/SemanticCoreAnnotationStore.scala new file mode 100644 index 000000000..cfeb7722b --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/SemanticCoreAnnotationStore.scala @@ -0,0 +1,64 @@ +package com.twitter.tsp.stores + +import com.twitter.escherbird.topicannotation.strato.thriftscala.TopicAnnotationValue +import com.twitter.escherbird.topicannotation.strato.thriftscala.TopicAnnotationView +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.simclusters_v2.common.TopicId +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.Client +import com.twitter.strato.thrift.ScroogeConvImplicits._ +import com.twitter.util.Future + +/** + * This is copied from `src/scala/com/twitter/topic_recos/stores/SemanticCoreAnnotationStore.scala` + * Unfortunately their version assumes (incorrectly) that there is no View which causes warnings. + * While these warnings may not cause any problems in practice, better safe than sorry. + */ +object SemanticCoreAnnotationStore { + private val column = "semanticCore/topicannotation/topicAnnotation.Tweet" + + def getStratoStore(stratoClient: Client): ReadableStore[TweetId, TopicAnnotationValue] = { + StratoFetchableStore + .withView[TweetId, TopicAnnotationView, TopicAnnotationValue]( + stratoClient, + column, + TopicAnnotationView()) + } + + case class TopicAnnotation( + topicId: TopicId, + ignoreSimClustersFilter: Boolean, + modelVersionId: Long) +} + +/** + * Given a tweet Id, return the list of annotations defined by the TSIG team. + */ +case class SemanticCoreAnnotationStore(stratoStore: ReadableStore[TweetId, TopicAnnotationValue]) + extends ReadableStore[TweetId, Seq[SemanticCoreAnnotationStore.TopicAnnotation]] { + import SemanticCoreAnnotationStore._ + + override def multiGet[K1 <: TweetId]( + ks: Set[K1] + ): Map[K1, Future[Option[Seq[TopicAnnotation]]]] = { + stratoStore + .multiGet(ks) + .mapValues(_.map(_.map { topicAnnotationValue => + topicAnnotationValue.annotationsPerModel match { + case Some(annotationWithVersions) => + annotationWithVersions.flatMap { annotations => + annotations.annotations.map { annotation => + TopicAnnotation( + annotation.entityId, + annotation.ignoreQualityFilter.getOrElse(false), + annotations.modelVersionId + ) + } + } + case _ => + Nil + } + })) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicSocialProofStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicSocialProofStore.scala new file mode 100644 index 000000000..6ed71ca14 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicSocialProofStore.scala @@ -0,0 +1,127 @@ +package com.twitter.tsp.stores + +import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.util.StatsUtil +import com.twitter.simclusters_v2.thriftscala._ +import com.twitter.storehaus.ReadableStore +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.tsp.stores.SemanticCoreAnnotationStore._ +import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof +import com.twitter.util.Future + +/** + * Provides a session-less Topic Social Proof information which doesn't rely on any User Info. + * This store is used by MemCache and In-Memory cache to achieve a higher performance. + * One Consumer embedding and Producer embedding are used to calculate raw score. + */ +case class TopicSocialProofStore( + representationScorerStore: ReadableStore[ScoreId, Score], + semanticCoreAnnotationStore: ReadableStore[TweetId, Seq[TopicAnnotation]] +)( + statsReceiver: StatsReceiver) + extends ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]] { + import TopicSocialProofStore._ + + // Fetches the tweet's topic annotations from SemanticCore's Annotation API + override def get(query: TopicSocialProofStore.Query): Future[Option[Seq[TopicSocialProof]]] = { + StatsUtil.trackOptionStats(statsReceiver) { + for { + annotations <- + StatsUtil.trackItemsStats(statsReceiver.scope("semanticCoreAnnotationStore")) { + semanticCoreAnnotationStore.get(query.cacheableQuery.tweetId).map(_.getOrElse(Nil)) + } + + filteredAnnotations = filterAnnotationsByAllowList(annotations, query) + + scoredTopics <- + StatsUtil.trackItemMapStats(statsReceiver.scope("scoreTopicTweetsTweetLanguage")) { + // de-dup identical topicIds + val uniqueTopicIds = filteredAnnotations.map { annotation => + TopicId(annotation.topicId, Some(query.cacheableQuery.tweetLanguage), country = None) + }.toSet + + if (query.cacheableQuery.enableCosineSimilarityScoreCalculation) { + scoreTopicTweets(query.cacheableQuery.tweetId, uniqueTopicIds) + } else { + Future.value(uniqueTopicIds.map(id => id -> Map.empty[ScoreKey, Double]).toMap) + } + } + + } yield { + if (scoredTopics.nonEmpty) { + val versionedTopicProofs = filteredAnnotations.map { annotation => + val topicId = + TopicId(annotation.topicId, Some(query.cacheableQuery.tweetLanguage), country = None) + + TopicSocialProof( + topicId, + scores = scoredTopics.getOrElse(topicId, Map.empty), + annotation.ignoreSimClustersFilter, + annotation.modelVersionId + ) + } + Some(versionedTopicProofs) + } else { + None + } + } + } + } + + /*** + * When the allowList is not empty (e.g., TSP handler call, CrTopic handler call), + * the filter will be enabled and we will only keep annotations that have versionIds existing + * in the input allowedSemanticCoreVersionIds set. + * But when the allowList is empty (e.g., some debugger calls), + * we will not filter anything and pass. + * We limit the number of versionIds to be K = MaxNumberVersionIds + */ + private def filterAnnotationsByAllowList( + annotations: Seq[TopicAnnotation], + query: TopicSocialProofStore.Query + ): Seq[TopicAnnotation] = { + + val trimmedVersionIds = query.allowedSemanticCoreVersionIds.take(MaxNumberVersionIds) + annotations.filter { annotation => + trimmedVersionIds.isEmpty || trimmedVersionIds.contains(annotation.modelVersionId) + } + } + + private def scoreTopicTweets( + tweetId: TweetId, + topicIds: Set[TopicId] + ): Future[Map[TopicId, Map[ScoreKey, Double]]] = { + Future.collect { + topicIds.map { topicId => + val scoresFut = TopicTweetsCosineSimilarityAggregateStore.getRawScoresMap( + topicId, + tweetId, + TopicTweetsCosineSimilarityAggregateStore.DefaultScoreKeys, + representationScorerStore + ) + topicId -> scoresFut + }.toMap + } + } +} + +object TopicSocialProofStore { + + private val MaxNumberVersionIds = 9 + + case class Query( + cacheableQuery: CacheableQuery, + allowedSemanticCoreVersionIds: Set[Long] = Set.empty) // overridden by FS + + case class CacheableQuery( + tweetId: TweetId, + tweetLanguage: String, + enableCosineSimilarityScoreCalculation: Boolean = true) + + case class TopicSocialProof( + topicId: TopicId, + scores: Map[ScoreKey, Double], + ignoreSimClusterFiltering: Boolean, + semanticCoreVersionId: Long) +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicStore.scala new file mode 100644 index 000000000..61fae8c6a --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicStore.scala @@ -0,0 +1,135 @@ +package com.twitter.tsp.stores + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.InterestedInInterestsFetchKey +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.hermit.store.common.ObservedReadableStore +import com.twitter.interests.thriftscala.InterestId +import com.twitter.interests.thriftscala.InterestLabel +import com.twitter.interests.thriftscala.InterestRelationship +import com.twitter.interests.thriftscala.InterestRelationshipV1 +import com.twitter.interests.thriftscala.InterestedInInterestLookupContext +import com.twitter.interests.thriftscala.InterestedInInterestModel +import com.twitter.interests.thriftscala.OptOutInterestLookupContext +import com.twitter.interests.thriftscala.UserInterest +import com.twitter.interests.thriftscala.UserInterestData +import com.twitter.interests.thriftscala.UserInterestsResponse +import com.twitter.simclusters_v2.common.UserId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.Client +import com.twitter.strato.thrift.ScroogeConvImplicits._ + +case class TopicResponse( + entityId: Long, + interestedInData: Seq[InterestedInInterestModel], + scoreOverride: Option[Double] = None, + notInterestedInTimestamp: Option[Long] = None, + topicFollowTimestamp: Option[Long] = None) + +case class TopicResponses(responses: Seq[TopicResponse]) + +object TopicStore { + + private val InterestedInInterestsColumn = "interests/interestedInInterests" + private lazy val ExplicitInterestsContext: InterestedInInterestLookupContext = + InterestedInInterestLookupContext( + explicitContext = None, + inferredContext = None, + disableImplicit = Some(true) + ) + + private def userInterestsResponseToTopicResponse( + userInterestsResponse: UserInterestsResponse + ): TopicResponses = { + val responses = userInterestsResponse.interests.interests.toSeq.flatMap { userInterests => + userInterests.collect { + case UserInterest( + InterestId.SemanticCore(semanticCoreEntity), + Some(UserInterestData.InterestedIn(data))) => + val topicFollowingTimestampOpt = data.collect { + case InterestedInInterestModel.ExplicitModel( + InterestRelationship.V1(interestRelationshipV1)) => + interestRelationshipV1.timestampMs + }.lastOption + + TopicResponse(semanticCoreEntity.id, data, None, None, topicFollowingTimestampOpt) + } + } + TopicResponses(responses) + } + + def explicitFollowingTopicStore( + stratoClient: Client + )( + implicit statsReceiver: StatsReceiver + ): ReadableStore[UserId, TopicResponses] = { + val stratoStore = + StratoFetchableStore + .withUnitView[InterestedInInterestsFetchKey, UserInterestsResponse]( + stratoClient, + InterestedInInterestsColumn) + .composeKeyMapping[UserId](uid => + InterestedInInterestsFetchKey( + userId = uid, + labels = None, + lookupContext = Some(ExplicitInterestsContext) + )) + .mapValues(userInterestsResponseToTopicResponse) + + ObservedReadableStore(stratoStore) + } + + def userOptOutTopicStore( + stratoClient: Client, + optOutStratoStorePath: String + )( + implicit statsReceiver: StatsReceiver + ): ReadableStore[UserId, TopicResponses] = { + val stratoStore = + StratoFetchableStore + .withUnitView[ + (Long, Option[Seq[InterestLabel]], Option[OptOutInterestLookupContext]), + UserInterestsResponse](stratoClient, optOutStratoStorePath) + .composeKeyMapping[UserId](uid => (uid, None, None)) + .mapValues { userInterestsResponse => + val responses = userInterestsResponse.interests.interests.toSeq.flatMap { userInterests => + userInterests.collect { + case UserInterest( + InterestId.SemanticCore(semanticCoreEntity), + Some(UserInterestData.InterestedIn(data))) => + TopicResponse(semanticCoreEntity.id, data, None) + } + } + TopicResponses(responses) + } + ObservedReadableStore(stratoStore) + } + + def notInterestedInTopicsStore( + stratoClient: Client, + notInterestedInStorePath: String + )( + implicit statsReceiver: StatsReceiver + ): ReadableStore[UserId, TopicResponses] = { + val stratoStore = + StratoFetchableStore + .withUnitView[Long, Seq[UserInterest]](stratoClient, notInterestedInStorePath) + .composeKeyMapping[UserId](identity) + .mapValues { notInterestedInInterests => + val responses = notInterestedInInterests.collect { + case UserInterest( + InterestId.SemanticCore(semanticCoreEntity), + Some(UserInterestData.NotInterested(notInterestedInData))) => + val notInterestedInTimestampOpt = notInterestedInData.collect { + case InterestRelationship.V1(interestRelationshipV1: InterestRelationshipV1) => + interestRelationshipV1.timestampMs + }.lastOption + + TopicResponse(semanticCoreEntity.id, Seq.empty, None, notInterestedInTimestampOpt) + } + TopicResponses(responses) + } + ObservedReadableStore(stratoStore) + } + +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicTweetsCosineSimilarityAggregateStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicTweetsCosineSimilarityAggregateStore.scala new file mode 100644 index 000000000..3fb65d8ac --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TopicTweetsCosineSimilarityAggregateStore.scala @@ -0,0 +1,99 @@ +package com.twitter.tsp.stores + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.thriftscala.EmbeddingType +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.simclusters_v2.thriftscala.ModelVersion +import com.twitter.simclusters_v2.thriftscala.ScoreInternalId +import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm +import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId +import com.twitter.simclusters_v2.thriftscala.{ + SimClustersEmbeddingPairScoreId => ThriftSimClustersEmbeddingPairScoreId +} +import com.twitter.simclusters_v2.thriftscala.TopicId +import com.twitter.simclusters_v2.thriftscala.{Score => ThriftScore} +import com.twitter.simclusters_v2.thriftscala.{ScoreId => ThriftScoreId} +import com.twitter.storehaus.ReadableStore +import com.twitter.topic_recos.common._ +import com.twitter.topic_recos.common.Configs.DefaultModelVersion +import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey +import com.twitter.util.Future + +object TopicTweetsCosineSimilarityAggregateStore { + + val TopicEmbeddingTypes: Seq[EmbeddingType] = + Seq( + EmbeddingType.FavTfgTopic, + EmbeddingType.LogFavBasedKgoApeTopic + ) + + // Add the new embedding types if want to test the new Tweet embedding performance. + val TweetEmbeddingTypes: Seq[EmbeddingType] = Seq(EmbeddingType.LogFavBasedTweet) + + val ModelVersions: Seq[ModelVersion] = + Seq(DefaultModelVersion) + + val DefaultScoreKeys: Seq[ScoreKey] = { + for { + modelVersion <- ModelVersions + topicEmbeddingType <- TopicEmbeddingTypes + tweetEmbeddingType <- TweetEmbeddingTypes + } yield { + ScoreKey( + topicEmbeddingType = topicEmbeddingType, + tweetEmbeddingType = tweetEmbeddingType, + modelVersion = modelVersion + ) + } + } + + case class ScoreKey( + topicEmbeddingType: EmbeddingType, + tweetEmbeddingType: EmbeddingType, + modelVersion: ModelVersion) + + def getRawScoresMap( + topicId: TopicId, + tweetId: TweetId, + scoreKeys: Seq[ScoreKey], + representationScorerStore: ReadableStore[ThriftScoreId, ThriftScore] + ): Future[Map[ScoreKey, Double]] = { + val scoresMapFut = scoreKeys.map { key => + val scoreInternalId = ScoreInternalId.SimClustersEmbeddingPairScoreId( + ThriftSimClustersEmbeddingPairScoreId( + buildTopicEmbedding(topicId, key.topicEmbeddingType, key.modelVersion), + SimClustersEmbeddingId( + key.tweetEmbeddingType, + key.modelVersion, + InternalId.TweetId(tweetId)) + )) + val scoreFut = representationScorerStore + .get( + ThriftScoreId( + algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, // Hard code as cosine sim + internalId = scoreInternalId + )) + key -> scoreFut + }.toMap + + Future + .collect(scoresMapFut).map(_.collect { + case (key, Some(ThriftScore(score))) => + (key, score) + }) + } +} + +case class TopicTweetsCosineSimilarityAggregateStore( + representationScorerStore: ReadableStore[ThriftScoreId, ThriftScore] +)( + statsReceiver: StatsReceiver) + extends ReadableStore[(TopicId, TweetId, Seq[ScoreKey]), Map[ScoreKey, Double]] { + import TopicTweetsCosineSimilarityAggregateStore._ + + override def get(k: (TopicId, TweetId, Seq[ScoreKey])): Future[Option[Map[ScoreKey, Double]]] = { + statsReceiver.counter("topicTweetsCosineSimilariltyAggregateStore").incr() + getRawScoresMap(k._1, k._2, k._3, representationScorerStore).map(Some(_)) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TweetInfoStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TweetInfoStore.scala new file mode 100644 index 000000000..70cc00451 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/TweetInfoStore.scala @@ -0,0 +1,230 @@ +package com.twitter.tsp.stores + +import com.twitter.conversions.DurationOps._ +import com.twitter.tsp.thriftscala.TspTweetInfo +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.thriftscala.TweetHealthScores +import com.twitter.frigate.thriftscala.UserAgathaScores +import com.twitter.logging.Logger +import com.twitter.mediaservices.commons.thriftscala.MediaCategory +import com.twitter.mediaservices.commons.tweetmedia.thriftscala.MediaInfo +import com.twitter.mediaservices.commons.tweetmedia.thriftscala.MediaSizeType +import com.twitter.simclusters_v2.common.TweetId +import com.twitter.simclusters_v2.common.UserId +import com.twitter.spam.rtf.thriftscala.SafetyLevel +import com.twitter.stitch.Stitch +import com.twitter.stitch.storehaus.ReadableStoreOfStitch +import com.twitter.stitch.tweetypie.TweetyPie +import com.twitter.stitch.tweetypie.TweetyPie.TweetyPieException +import com.twitter.storehaus.ReadableStore +import com.twitter.topiclisting.AnnotationRuleProvider +import com.twitter.tsp.utils.HealthSignalsUtils +import com.twitter.tweetypie.thriftscala.TweetInclude +import com.twitter.tweetypie.thriftscala.{Tweet => TTweet} +import com.twitter.tweetypie.thriftscala._ +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.TimeoutException +import com.twitter.util.Timer + +object TweetyPieFieldsStore { + + // Tweet fields options. Only fields specified here will be hydrated in the tweet + private val CoreTweetFields: Set[TweetInclude] = Set[TweetInclude]( + TweetInclude.TweetFieldId(TTweet.IdField.id), + TweetInclude.TweetFieldId(TTweet.CoreDataField.id), // needed for the authorId + TweetInclude.TweetFieldId(TTweet.LanguageField.id), + TweetInclude.CountsFieldId(StatusCounts.FavoriteCountField.id), + TweetInclude.CountsFieldId(StatusCounts.RetweetCountField.id), + TweetInclude.TweetFieldId(TTweet.QuotedTweetField.id), + TweetInclude.TweetFieldId(TTweet.MediaKeysField.id), + TweetInclude.TweetFieldId(TTweet.EscherbirdEntityAnnotationsField.id), + TweetInclude.TweetFieldId(TTweet.MediaField.id), + TweetInclude.TweetFieldId(TTweet.UrlsField.id) + ) + + private val gtfo: GetTweetFieldsOptions = GetTweetFieldsOptions( + tweetIncludes = CoreTweetFields, + safetyLevel = Some(SafetyLevel.Recommendations) + ) + + def getStoreFromTweetyPie( + tweetyPie: TweetyPie, + convertExceptionsToNotFound: Boolean = true + ): ReadableStore[Long, GetTweetFieldsResult] = { + val log = Logger("TweetyPieFieldsStore") + + ReadableStoreOfStitch { tweetId: Long => + tweetyPie + .getTweetFields(tweetId, options = gtfo) + .rescue { + case ex: TweetyPieException if convertExceptionsToNotFound => + log.error(ex, s"Error while hitting tweetypie ${ex.result}") + Stitch.NotFound + } + } + } +} + +object TweetInfoStore { + + case class IsPassTweetHealthFilters(tweetStrictest: Option[Boolean]) + + case class IsPassAgathaHealthFilters(agathaStrictest: Option[Boolean]) + + private val HealthStoreTimeout: Duration = 40.milliseconds + private val isPassTweetHealthFilters: IsPassTweetHealthFilters = IsPassTweetHealthFilters(None) + private val isPassAgathaHealthFilters: IsPassAgathaHealthFilters = IsPassAgathaHealthFilters(None) +} + +case class TweetInfoStore( + tweetFieldsStore: ReadableStore[TweetId, GetTweetFieldsResult], + tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores], + userHealthModelStore: ReadableStore[UserId, UserAgathaScores], + timer: Timer +)( + statsReceiver: StatsReceiver) + extends ReadableStore[TweetId, TspTweetInfo] { + + import TweetInfoStore._ + + private[this] def toTweetInfo( + tweetFieldsResult: GetTweetFieldsResult + ): Future[Option[TspTweetInfo]] = { + tweetFieldsResult.tweetResult match { + case result: TweetFieldsResultState.Found if result.found.suppressReason.isEmpty => + val tweet = result.found.tweet + + val authorIdOpt = tweet.coreData.map(_.userId) + val favCountOpt = tweet.counts.flatMap(_.favoriteCount) + + val languageOpt = tweet.language.map(_.language) + val hasImageOpt = + tweet.mediaKeys.map(_.map(_.mediaCategory).exists(_ == MediaCategory.TweetImage)) + val hasGifOpt = + tweet.mediaKeys.map(_.map(_.mediaCategory).exists(_ == MediaCategory.TweetGif)) + val isNsfwAuthorOpt = Some( + tweet.coreData.exists(_.nsfwUser) || tweet.coreData.exists(_.nsfwAdmin)) + val isTweetReplyOpt = tweet.coreData.map(_.reply.isDefined) + val hasMultipleMediaOpt = + tweet.mediaKeys.map(_.map(_.mediaCategory).size > 1) + + val isKGODenylist = Some( + tweet.escherbirdEntityAnnotations + .exists(_.entityAnnotations.exists(AnnotationRuleProvider.isSuppressedTopicsDenylist))) + + val isNullcastOpt = tweet.coreData.map(_.nullcast) // These are Ads. go/nullcast + + val videoDurationOpt = tweet.media.flatMap(_.flatMap { + _.mediaInfo match { + case Some(MediaInfo.VideoInfo(info)) => + Some((info.durationMillis + 999) / 1000) // video playtime always round up + case _ => None + } + }.headOption) + + // There many different types of videos. To be robust to new types being added, we just use + // the videoDurationOpt to keep track of whether the item has a video or not. + val hasVideo = videoDurationOpt.isDefined + + val mediaDimensionsOpt = + tweet.media.flatMap(_.headOption.flatMap( + _.sizes.find(_.sizeType == MediaSizeType.Orig).map(size => (size.width, size.height)))) + + val mediaWidth = mediaDimensionsOpt.map(_._1).getOrElse(1) + val mediaHeight = mediaDimensionsOpt.map(_._2).getOrElse(1) + // high resolution media's width is always greater than 480px and height is always greater than 480px + val isHighMediaResolution = mediaHeight > 480 && mediaWidth > 480 + val isVerticalAspectRatio = mediaHeight >= mediaWidth && mediaWidth > 1 + val hasUrlOpt = tweet.urls.map(_.nonEmpty) + + (authorIdOpt, favCountOpt) match { + case (Some(authorId), Some(favCount)) => + hydrateHealthScores(tweet.id, authorId).map { + case (isPassAgathaHealthFilters, isPassTweetHealthFilters) => + Some( + TspTweetInfo( + authorId = authorId, + favCount = favCount, + language = languageOpt, + hasImage = hasImageOpt, + hasVideo = Some(hasVideo), + hasGif = hasGifOpt, + isNsfwAuthor = isNsfwAuthorOpt, + isKGODenylist = isKGODenylist, + isNullcast = isNullcastOpt, + videoDurationSeconds = videoDurationOpt, + isHighMediaResolution = Some(isHighMediaResolution), + isVerticalAspectRatio = Some(isVerticalAspectRatio), + isPassAgathaHealthFilterStrictest = isPassAgathaHealthFilters.agathaStrictest, + isPassTweetHealthFilterStrictest = isPassTweetHealthFilters.tweetStrictest, + isReply = isTweetReplyOpt, + hasMultipleMedia = hasMultipleMediaOpt, + hasUrl = hasUrlOpt + )) + } + case _ => + statsReceiver.counter("missingFields").incr() + Future.None // These values should always exist. + } + case _: TweetFieldsResultState.NotFound => + statsReceiver.counter("notFound").incr() + Future.None + case _: TweetFieldsResultState.Failed => + statsReceiver.counter("failed").incr() + Future.None + case _: TweetFieldsResultState.Filtered => + statsReceiver.counter("filtered").incr() + Future.None + case _ => + statsReceiver.counter("unknown").incr() + Future.None + } + } + + private[this] def hydrateHealthScores( + tweetId: TweetId, + authorId: Long + ): Future[(IsPassAgathaHealthFilters, IsPassTweetHealthFilters)] = { + Future + .join( + tweetHealthModelStore + .multiGet(Set(tweetId))(tweetId), + userHealthModelStore + .multiGet(Set(authorId))(authorId) + ).map { + case (tweetHealthScoresOpt, userAgathaScoresOpt) => + // This stats help us understand empty rate for AgathaCalibratedNsfw / NsfwTextUserScore + statsReceiver.counter("totalCountAgathaScore").incr() + if (userAgathaScoresOpt.getOrElse(UserAgathaScores()).agathaCalibratedNsfw.isEmpty) + statsReceiver.counter("emptyCountAgathaCalibratedNsfw").incr() + if (userAgathaScoresOpt.getOrElse(UserAgathaScores()).nsfwTextUserScore.isEmpty) + statsReceiver.counter("emptyCountNsfwTextUserScore").incr() + + val isPassAgathaHealthFilters = IsPassAgathaHealthFilters( + agathaStrictest = + Some(HealthSignalsUtils.isTweetAgathaModelQualified(userAgathaScoresOpt)), + ) + + val isPassTweetHealthFilters = IsPassTweetHealthFilters( + tweetStrictest = + Some(HealthSignalsUtils.isTweetHealthModelQualified(tweetHealthScoresOpt)) + ) + + (isPassAgathaHealthFilters, isPassTweetHealthFilters) + }.raiseWithin(HealthStoreTimeout)(timer).rescue { + case _: TimeoutException => + statsReceiver.counter("hydrateHealthScoreTimeout").incr() + Future.value((isPassAgathaHealthFilters, isPassTweetHealthFilters)) + case _ => + statsReceiver.counter("hydrateHealthScoreFailure").incr() + Future.value((isPassAgathaHealthFilters, isPassTweetHealthFilters)) + } + } + + override def multiGet[K1 <: TweetId](ks: Set[K1]): Map[K1, Future[Option[TspTweetInfo]]] = { + statsReceiver.counter("tweetFieldsStore").incr(ks.size) + tweetFieldsStore + .multiGet(ks).mapValues(_.flatMap { _.map { v => toTweetInfo(v) }.getOrElse(Future.None) }) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/UttTopicFilterStore.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/UttTopicFilterStore.scala new file mode 100644 index 000000000..89a502008 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/stores/UttTopicFilterStore.scala @@ -0,0 +1,248 @@ +package com.twitter.tsp.stores + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.FailureFlags.flagsOf +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.interests +import com.twitter.simclusters_v2.common.UserId +import com.twitter.storehaus.ReadableStore +import com.twitter.topiclisting.ProductId +import com.twitter.topiclisting.TopicListing +import com.twitter.topiclisting.TopicListingViewerContext +import com.twitter.topiclisting.{SemanticCoreEntityId => ScEntityId} +import com.twitter.tsp.thriftscala.TopicFollowType +import com.twitter.tsp.thriftscala.TopicListingSetting +import com.twitter.tsp.thriftscala.TopicSocialProofFilteringBypassMode +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.TimeoutException +import com.twitter.util.Timer + +class UttTopicFilterStore( + topicListing: TopicListing, + userOptOutTopicsStore: ReadableStore[interests.UserId, TopicResponses], + explicitFollowingTopicsStore: ReadableStore[interests.UserId, TopicResponses], + notInterestedTopicsStore: ReadableStore[interests.UserId, TopicResponses], + localizedUttRecommendableTopicsStore: ReadableStore[LocalizedUttTopicNameRequest, Set[Long]], + timer: Timer, + stats: StatsReceiver) { + import UttTopicFilterStore._ + + // Set of blacklisted SemanticCore IDs that are paused. + private[this] def getPausedTopics(topicCtx: TopicListingViewerContext): Set[ScEntityId] = { + topicListing.getPausedTopics(topicCtx) + } + + private[this] def getOptOutTopics(userId: Long): Future[Set[ScEntityId]] = { + stats.counter("getOptOutTopicsCount").incr() + userOptOutTopicsStore + .get(userId).map { responseOpt => + responseOpt + .map { responses => responses.responses.map(_.entityId) }.getOrElse(Seq.empty).toSet + }.raiseWithin(DefaultOptOutTimeout)(timer).rescue { + case err: TimeoutException => + stats.counter("getOptOutTopicsTimeout").incr() + Future.exception(err) + case err: ClientDiscardedRequestException + if flagsOf(err).contains("interrupted") && flagsOf(err) + .contains("ignorable") => + stats.counter("getOptOutTopicsDiscardedBackupRequest").incr() + Future.exception(err) + case err => + stats.counter("getOptOutTopicsFailure").incr() + Future.exception(err) + } + } + + private[this] def getNotInterestedIn(userId: Long): Future[Set[ScEntityId]] = { + stats.counter("getNotInterestedInCount").incr() + notInterestedTopicsStore + .get(userId).map { responseOpt => + responseOpt + .map { responses => responses.responses.map(_.entityId) }.getOrElse(Seq.empty).toSet + }.raiseWithin(DefaultNotInterestedInTimeout)(timer).rescue { + case err: TimeoutException => + stats.counter("getNotInterestedInTimeout").incr() + Future.exception(err) + case err: ClientDiscardedRequestException + if flagsOf(err).contains("interrupted") && flagsOf(err) + .contains("ignorable") => + stats.counter("getNotInterestedInDiscardedBackupRequest").incr() + Future.exception(err) + case err => + stats.counter("getNotInterestedInFailure").incr() + Future.exception(err) + } + } + + private[this] def getFollowedTopics(userId: Long): Future[Set[TopicResponse]] = { + stats.counter("getFollowedTopicsCount").incr() + + explicitFollowingTopicsStore + .get(userId).map { responseOpt => + responseOpt.map(_.responses.toSet).getOrElse(Set.empty) + }.raiseWithin(DefaultInterestedInTimeout)(timer).rescue { + case _: TimeoutException => + stats.counter("getFollowedTopicsTimeout").incr() + Future(Set.empty) + case _ => + stats.counter("getFollowedTopicsFailure").incr() + Future(Set.empty) + } + } + + private[this] def getFollowedTopicIds(userId: Long): Future[Set[ScEntityId]] = { + getFollowedTopics(userId: Long).map(_.map(_.entityId)) + } + + private[this] def getWhitelistTopicIds( + normalizedContext: TopicListingViewerContext, + enableInternationalTopics: Boolean + ): Future[Set[ScEntityId]] = { + stats.counter("getWhitelistTopicIdsCount").incr() + + val uttRequest = LocalizedUttTopicNameRequest( + productId = ProductId.Followable, + viewerContext = normalizedContext, + enableInternationalTopics = enableInternationalTopics + ) + localizedUttRecommendableTopicsStore + .get(uttRequest).map { response => + response.getOrElse(Set.empty) + }.rescue { + case _ => + stats.counter("getWhitelistTopicIdsFailure").incr() + Future(Set.empty) + } + } + + private[this] def getDenyListTopicIdsForUser( + userId: UserId, + topicListingSetting: TopicListingSetting, + context: TopicListingViewerContext, + bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]] + ): Future[Set[ScEntityId]] = { + + val denyListTopicIdsFuture = topicListingSetting match { + case TopicListingSetting.ImplicitFollow => + getFollowedTopicIds(userId) + case _ => + Future(Set.empty[ScEntityId]) + } + + // we don't filter opt-out topics for implicit follow topic listing setting + val optOutTopicIdsFuture = topicListingSetting match { + case TopicListingSetting.ImplicitFollow => Future(Set.empty[ScEntityId]) + case _ => getOptOutTopics(userId) + } + + val notInterestedTopicIdsFuture = + if (bypassModes.exists(_.contains(TopicSocialProofFilteringBypassMode.NotInterested))) { + Future(Set.empty[ScEntityId]) + } else { + getNotInterestedIn(userId) + } + val pausedTopicIdsFuture = Future.value(getPausedTopics(context)) + + Future + .collect( + List( + denyListTopicIdsFuture, + optOutTopicIdsFuture, + notInterestedTopicIdsFuture, + pausedTopicIdsFuture)).map { list => list.reduce(_ ++ _) } + } + + private[this] def getDiff( + aFut: Future[Set[ScEntityId]], + bFut: Future[Set[ScEntityId]] + ): Future[Set[ScEntityId]] = { + Future.join(aFut, bFut).map { + case (a, b) => a.diff(b) + } + } + + /** + * calculates the diff of all the whitelisted IDs with blacklisted IDs and returns the set of IDs + * that we will be recommending from or followed topics by the user by client setting. + */ + def getAllowListTopicsForUser( + userId: UserId, + topicListingSetting: TopicListingSetting, + context: TopicListingViewerContext, + bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]] + ): Future[Map[ScEntityId, Option[TopicFollowType]]] = { + + /** + * Title: an illustrative table to explain how allow list is composed + * AllowList = WhiteList - DenyList - OptOutTopics - PausedTopics - NotInterestedInTopics + * + * TopicListingSetting: Following ImplicitFollow All Followable + * Whitelist: FollowedTopics(user) AllWhitelistedTopics Nil AllWhitelistedTopics + * DenyList: Nil FollowedTopics(user) Nil Nil + * + * ps. for TopicListingSetting.All, the returned allow list is Nil. Why? + * It's because that allowList is not required given the TopicListingSetting == 'All'. + * See TopicSocialProofHandler.filterByAllowedList() for more details. + */ + + topicListingSetting match { + // "All" means all the UTT entity is qualified. So don't need to fetch the Whitelist anymore. + case TopicListingSetting.All => Future.value(Map.empty) + case TopicListingSetting.Following => + getFollowingTopicsForUserWithTimestamp(userId, context, bypassModes).map { + _.mapValues(_ => Some(TopicFollowType.Following)) + } + case TopicListingSetting.ImplicitFollow => + getDiff( + getWhitelistTopicIds(context, enableInternationalTopics = true), + getDenyListTopicIdsForUser(userId, topicListingSetting, context, bypassModes)).map { + _.map { scEntityId => + scEntityId -> Some(TopicFollowType.ImplicitFollow) + }.toMap + } + case _ => + val followedTopicIdsFut = getFollowedTopicIds(userId) + val allowListTopicIdsFut = getDiff( + getWhitelistTopicIds(context, enableInternationalTopics = true), + getDenyListTopicIdsForUser(userId, topicListingSetting, context, bypassModes)) + Future.join(allowListTopicIdsFut, followedTopicIdsFut).map { + case (allowListTopicId, followedTopicIds) => + allowListTopicId.map { scEntityId => + if (followedTopicIds.contains(scEntityId)) + scEntityId -> Some(TopicFollowType.Following) + else scEntityId -> Some(TopicFollowType.ImplicitFollow) + }.toMap + } + } + } + + private[this] def getFollowingTopicsForUserWithTimestamp( + userId: UserId, + context: TopicListingViewerContext, + bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]] + ): Future[Map[ScEntityId, Option[Long]]] = { + + val followedTopicIdToTimestampFut = getFollowedTopics(userId).map(_.map { followedTopic => + followedTopic.entityId -> followedTopic.topicFollowTimestamp + }.toMap) + + followedTopicIdToTimestampFut.flatMap { followedTopicIdToTimestamp => + getDiff( + Future(followedTopicIdToTimestamp.keySet), + getDenyListTopicIdsForUser(userId, TopicListingSetting.Following, context, bypassModes) + ).map { + _.map { scEntityId => + scEntityId -> followedTopicIdToTimestamp.get(scEntityId).flatten + }.toMap + } + } + } +} + +object UttTopicFilterStore { + val DefaultNotInterestedInTimeout: Duration = 60.milliseconds + val DefaultOptOutTimeout: Duration = 60.milliseconds + val DefaultInterestedInTimeout: Duration = 60.milliseconds +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/BUILD b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/BUILD new file mode 100644 index 000000000..3f4c6f42c --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + ], + dependencies = [ + "3rdparty/jvm/org/lz4:lz4-java", + "content-recommender/thrift/src/main/thrift:thrift-scala", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/health", + "stitch/stitch-storehaus", + "topic-social-proof/server/src/main/thrift:thrift-scala", + ], +) diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/LZ4Injection.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/LZ4Injection.scala new file mode 100644 index 000000000..c72b6032f --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/LZ4Injection.scala @@ -0,0 +1,19 @@ +package com.twitter.tsp.utils + +import com.twitter.bijection.Injection +import scala.util.Try +import net.jpountz.lz4.LZ4CompressorWithLength +import net.jpountz.lz4.LZ4DecompressorWithLength +import net.jpountz.lz4.LZ4Factory + +object LZ4Injection extends Injection[Array[Byte], Array[Byte]] { + private val lz4Factory = LZ4Factory.fastestInstance() + private val fastCompressor = new LZ4CompressorWithLength(lz4Factory.fastCompressor()) + private val decompressor = new LZ4DecompressorWithLength(lz4Factory.fastDecompressor()) + + override def apply(a: Array[Byte]): Array[Byte] = LZ4Injection.fastCompressor.compress(a) + + override def invert(b: Array[Byte]): Try[Array[Byte]] = Try { + LZ4Injection.decompressor.decompress(b) + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/ReadableStoreWithMapOptionValues.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/ReadableStoreWithMapOptionValues.scala new file mode 100644 index 000000000..ddae5a310 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/ReadableStoreWithMapOptionValues.scala @@ -0,0 +1,20 @@ +package com.twitter.tsp.utils + +import com.twitter.storehaus.AbstractReadableStore +import com.twitter.storehaus.ReadableStore +import com.twitter.util.Future + +class ReadableStoreWithMapOptionValues[K, V1, V2](rs: ReadableStore[K, V1]) { + + def mapOptionValues( + fn: V1 => Option[V2] + ): ReadableStore[K, V2] = { + val self = rs + new AbstractReadableStore[K, V2] { + override def get(k: K): Future[Option[V2]] = self.get(k).map(_.flatMap(fn)) + + override def multiGet[K1 <: K](ks: Set[K1]): Map[K1, Future[Option[V2]]] = + self.multiGet(ks).mapValues(_.map(_.flatMap(fn))) + } + } +} diff --git a/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/SeqObjectInjection.scala b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/SeqObjectInjection.scala new file mode 100644 index 000000000..96a0740e4 --- /dev/null +++ b/topic-social-proof/server/src/main/scala/com/twitter/tsp/utils/SeqObjectInjection.scala @@ -0,0 +1,32 @@ +package com.twitter.tsp.utils + +import com.twitter.bijection.Injection +import java.io.ByteArrayInputStream +import java.io.ByteArrayOutputStream +import java.io.ObjectInputStream +import java.io.ObjectOutputStream +import java.io.Serializable +import scala.util.Try + +/** + * @tparam T must be a serializable class + */ +case class SeqObjectInjection[T <: Serializable]() extends Injection[Seq[T], Array[Byte]] { + + override def apply(seq: Seq[T]): Array[Byte] = { + val byteStream = new ByteArrayOutputStream() + val outputStream = new ObjectOutputStream(byteStream) + outputStream.writeObject(seq) + outputStream.close() + byteStream.toByteArray + } + + override def invert(bytes: Array[Byte]): Try[Seq[T]] = { + Try { + val inputStream = new ObjectInputStream(new ByteArrayInputStream(bytes)) + val seq = inputStream.readObject().asInstanceOf[Seq[T]] + inputStream.close() + seq + } + } +} diff --git a/topic-social-proof/server/src/main/thrift/BUILD b/topic-social-proof/server/src/main/thrift/BUILD new file mode 100644 index 000000000..9bdbb71e0 --- /dev/null +++ b/topic-social-proof/server/src/main/thrift/BUILD @@ -0,0 +1,21 @@ +create_thrift_libraries( + base_name = "thrift", + sources = ["*.thrift"], + platform = "java8", + tags = [ + "bazel-compatible", + ], + dependency_roots = [ + "content-recommender/thrift/src/main/thrift", + "content-recommender/thrift/src/main/thrift:content-recommender-common", + "interests-service/thrift/src/main/thrift", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift", + ], + generate_languages = [ + "java", + "scala", + "strato", + ], + provides_java_name = "tsp-thrift-java", + provides_scala_name = "tsp-thrift-scala", +) diff --git a/topic-social-proof/server/src/main/thrift/service.thrift b/topic-social-proof/server/src/main/thrift/service.thrift new file mode 100644 index 000000000..70f3c5398 --- /dev/null +++ b/topic-social-proof/server/src/main/thrift/service.thrift @@ -0,0 +1,104 @@ +namespace java com.twitter.tsp.thriftjava +namespace py gen.twitter.tsp +#@namespace scala com.twitter.tsp.thriftscala +#@namespace strato com.twitter.tsp.strato + +include "com/twitter/contentrecommender/common.thrift" +include "com/twitter/simclusters_v2/identifier.thrift" +include "com/twitter/simclusters_v2/online_store.thrift" +include "topic_listing.thrift" + +enum TopicListingSetting { + All = 0 // All the existing Semantic Core Entity/Topics. ie., All topics on twitter, and may or may not have been launched yet. + Followable = 1 // All the topics which the user is allowed to follow. ie., topics that have shipped, and user may or may not be following it. + Following = 2 // Only topics the user is explicitly following + ImplicitFollow = 3 // The topics user has not followed but implicitly may follow. ie., Only topics that user has not followed. +} (hasPersonalData='false') + + +// used to tell Topic Social Proof endpoint which specific filtering can be bypassed +enum TopicSocialProofFilteringBypassMode { + NotInterested = 0 +} (hasPersonalData='false') + +struct TopicSocialProofRequest { + 1: required i64 userId(personalDataType = "UserId") + 2: required set tweetIds(personalDataType = 'TweetId') + 3: required common.DisplayLocation displayLocation + 4: required TopicListingSetting topicListingSetting + 5: required topic_listing.TopicListingViewerContext context + 6: optional set bypassModes + 7: optional map> tags +} + +struct TopicSocialProofOptions { + 1: required i64 userId(personalDataType = "UserId") + 2: required common.DisplayLocation displayLocation + 3: required TopicListingSetting topicListingSetting + 4: required topic_listing.TopicListingViewerContext context + 5: optional set bypassModes + 6: optional map> tags +} + +struct TopicSocialProofResponse { + 1: required map> socialProofs +}(hasPersonalData='false') + +// Distinguishes between how a topic tweet is generated. Useful for metric tracking and debugging +enum TopicTweetType { + // CrOON candidates + UserInterestedIn = 1 + Twistly = 2 + // crTopic candidates + SkitConsumerEmbeddings = 100 + SkitProducerEmbeddings = 101 + SkitHighPrecision = 102 + SkitInterestBrowser = 103 + Certo = 104 +}(persisted='true') + +struct TopicWithScore { + 1: required i64 topicId + 2: required double score // score used to rank topics relative to one another + 3: optional TopicTweetType algorithmType // how the topic is generated + 4: optional TopicFollowType topicFollowType // Whether the topic is being explicitly or implicily followed +}(persisted='true', hasPersonalData='false') + + +struct ScoreKey { + 1: required identifier.EmbeddingType userEmbeddingType + 2: required identifier.EmbeddingType topicEmbeddingType + 3: required online_store.ModelVersion modelVersion +}(persisted='true', hasPersonalData='false') + +struct UserTopicScore { + 1: required map scores +}(persisted='true', hasPersonalData='false') + + +enum TopicFollowType { + Following = 1 + ImplicitFollow = 2 +}(persisted='true') + +// Provide the Tags which provides the Recommended Tweets Source Signal and other context. +// Warning: Please don't use this tag in any ML Features or business logic. +enum MetricTag { + // Source Signal Tags + TweetFavorite = 0 + Retweet = 1 + + UserFollow = 101 + PushOpenOrNtabClick = 201 + + HomeTweetClick = 301 + HomeVideoView = 302 + HomeSongbirdShowMore = 303 + + + InterestsRankerRecentSearches = 401 // For Interests Candidate Expansion + + UserInterestedIn = 501 + MBCG = 503 + // Other Metric Tags +} (persisted='true', hasPersonalData='true') diff --git a/topic-social-proof/server/src/main/thrift/tweet_info.thrift b/topic-social-proof/server/src/main/thrift/tweet_info.thrift new file mode 100644 index 000000000..d32b1aeac --- /dev/null +++ b/topic-social-proof/server/src/main/thrift/tweet_info.thrift @@ -0,0 +1,26 @@ +namespace java com.twitter.tsp.thriftjava +namespace py gen.twitter.tsp +#@namespace scala com.twitter.tsp.thriftscala +#@namespace strato com.twitter.tsp.strato + +struct TspTweetInfo { + 1: required i64 authorId + 2: required i64 favCount + 3: optional string language + 6: optional bool hasImage + 7: optional bool hasVideo + 8: optional bool hasGif + 9: optional bool isNsfwAuthor + 10: optional bool isKGODenylist + 11: optional bool isNullcast + // available if the tweet contains video + 12: optional i32 videoDurationSeconds + 13: optional bool isHighMediaResolution + 14: optional bool isVerticalAspectRatio + // health signal scores + 15: optional bool isPassAgathaHealthFilterStrictest + 16: optional bool isPassTweetHealthFilterStrictest + 17: optional bool isReply + 18: optional bool hasMultipleMedia + 23: optional bool hasUrl +}(persisted='false', hasPersonalData='true') diff --git a/unified_user_actions/.gitignore b/unified_user_actions/.gitignore new file mode 100644 index 000000000..e98c1bb78 --- /dev/null +++ b/unified_user_actions/.gitignore @@ -0,0 +1,4 @@ +.DS_Store +CONFIG.ini +PROJECT +docs diff --git a/unified_user_actions/BUILD.bazel b/unified_user_actions/BUILD.bazel new file mode 100644 index 000000000..1624a57d4 --- /dev/null +++ b/unified_user_actions/BUILD.bazel @@ -0,0 +1 @@ +# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD diff --git a/unified_user_actions/README.md b/unified_user_actions/README.md new file mode 100644 index 000000000..4211e7ade --- /dev/null +++ b/unified_user_actions/README.md @@ -0,0 +1,10 @@ +# Unified User Actions (UUA) + +**Unified User Actions** (UUA) is a centralized, real-time stream of user actions on Twitter, consumed by various product, ML, and marketing teams. UUA reads client-side and server-side event streams that contain the user's actions and generates a unified real-time user actions Kafka stream. The Kafka stream is replicated to HDFS, GCP Pubsub, GCP GCS, GCP BigQuery. The user actions include public actions such as favorites, retweets, replies and implicit actions like bookmark, impression, video view. + +## Components + +- adapter: transform the raw inputs to UUA Thrift output +- client: Kafka client related utils +- kafka: more specific Kafka utils like customized serde +- service: deployment, modules and services \ No newline at end of file diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/AbstractAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/AbstractAdapter.scala new file mode 100644 index 000000000..385a3d23d --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/AbstractAdapter.scala @@ -0,0 +1,19 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver + +trait AbstractAdapter[INPUT, OUTK, OUTV] extends Serializable { + + /** + * The basic input -> seq[output] adapter which concrete adapters should extend from + * @param input a single INPUT + * @return A list of (OUTK, OUTV) tuple. The OUTK is the output key mainly for publishing to Kafka (or Pubsub). + * If other processing, e.g. offline batch processing, doesn't require the output key then it can drop it + * like source.adaptOneToKeyedMany.map(_._2) + */ + def adaptOneToKeyedMany( + input: INPUT, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(OUTK, OUTV)] +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/BUILD new file mode 100644 index 000000000..a6ef069c4 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/BUILD @@ -0,0 +1,11 @@ +scala_library( + name = "base", + sources = [ + "AbstractAdapter.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagement.scala new file mode 100644 index 000000000..41db74b4b --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagement.scala @@ -0,0 +1,125 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.thriftscala._ + +object AdsCallbackEngagement { + object PromotedTweetFav extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetFav) + + object PromotedTweetUnfav extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetUnfav) + + object PromotedTweetReply extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetReply) + + object PromotedTweetRetweet + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetRetweet) + + object PromotedTweetBlockAuthor + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetBlockAuthor) + + object PromotedTweetUnblockAuthor + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetUnblockAuthor) + + object PromotedTweetComposeTweet + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetComposeTweet) + + object PromotedTweetClick extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClick) + + object PromotedTweetReport extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetReport) + + object PromotedProfileFollow + extends ProfileAdsCallbackEngagement(ActionType.ServerPromotedProfileFollow) + + object PromotedProfileUnfollow + extends ProfileAdsCallbackEngagement(ActionType.ServerPromotedProfileUnfollow) + + object PromotedTweetMuteAuthor + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetMuteAuthor) + + object PromotedTweetClickProfile + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClickProfile) + + object PromotedTweetClickHashtag + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClickHashtag) + + object PromotedTweetOpenLink + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetOpenLink) { + override def getItem(input: SpendServerEvent): Option[Item] = { + input.engagementEvent.flatMap { e => + e.impressionData.flatMap { i => + getPromotedTweetInfo( + i.promotedTweetId, + i.advertiserId, + tweetActionInfoOpt = Some( + TweetActionInfo.ServerPromotedTweetOpenLink( + ServerPromotedTweetOpenLink(url = e.url)))) + } + } + } + } + + object PromotedTweetCarouselSwipeNext + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetCarouselSwipeNext) + + object PromotedTweetCarouselSwipePrevious + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetCarouselSwipePrevious) + + object PromotedTweetLingerImpressionShort + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionShort) + + object PromotedTweetLingerImpressionMedium + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionMedium) + + object PromotedTweetLingerImpressionLong + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionLong) + + object PromotedTweetClickSpotlight + extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTweetClickSpotlight) + + object PromotedTweetViewSpotlight + extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTweetViewSpotlight) + + object PromotedTrendView + extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTrendView) + + object PromotedTrendClick + extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTrendClick) + + object PromotedTweetVideoPlayback25 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback25) + + object PromotedTweetVideoPlayback50 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback50) + + object PromotedTweetVideoPlayback75 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback75) + + object PromotedTweetVideoAdPlayback25 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback25) + + object PromotedTweetVideoAdPlayback50 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback50) + + object PromotedTweetVideoAdPlayback75 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback75) + + object TweetVideoAdPlayback25 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback25) + + object TweetVideoAdPlayback50 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback50) + + object TweetVideoAdPlayback75 + extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback75) + + object PromotedTweetDismissWithoutReason + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissWithoutReason) + + object PromotedTweetDismissUninteresting + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissUninteresting) + + object PromotedTweetDismissRepetitive + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissRepetitive) + + object PromotedTweetDismissSpam + extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissSpam) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagementsAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagementsAdapter.scala new file mode 100644 index 000000000..f59ee9e48 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/AdsCallbackEngagementsAdapter.scala @@ -0,0 +1,28 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +class AdsCallbackEngagementsAdapter + extends AbstractAdapter[SpendServerEvent, UnKeyed, UnifiedUserAction] { + + import AdsCallbackEngagementsAdapter._ + + override def adaptOneToKeyedMany( + input: SpendServerEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object AdsCallbackEngagementsAdapter { + def adaptEvent(input: SpendServerEvent): Seq[UnifiedUserAction] = { + val baseEngagements: Seq[BaseAdsCallbackEngagement] = + EngagementTypeMappings.getEngagementMappings(Option(input).flatMap(_.engagementEvent)) + baseEngagements.flatMap(_.getUUA(input)) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BUILD new file mode 100644 index 000000000..e945f872a --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BUILD @@ -0,0 +1,18 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/ads/billing/spendserver:spendserver_thrift-scala", + "src/thrift/com/twitter/ads/eventstream:eventstream-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseAdsCallbackEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseAdsCallbackEngagement.scala new file mode 100644 index 000000000..2cefd7af3 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseAdsCallbackEngagement.scala @@ -0,0 +1,68 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.TweetInfo +import com.twitter.unified_user_actions.thriftscala.TweetActionInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +abstract class BaseAdsCallbackEngagement(actionType: ActionType) { + + protected def getItem(input: SpendServerEvent): Option[Item] = { + input.engagementEvent.flatMap { e => + e.impressionData.flatMap { i => + getPromotedTweetInfo(i.promotedTweetId, i.advertiserId) + } + } + } + + protected def getPromotedTweetInfo( + promotedTweetIdOpt: Option[Long], + advertiserId: Long, + tweetActionInfoOpt: Option[TweetActionInfo] = None + ): Option[Item] = { + promotedTweetIdOpt.map { promotedTweetId => + Item.TweetInfo( + TweetInfo( + actionTweetId = promotedTweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(advertiserId))), + tweetActionInfo = tweetActionInfoOpt) + ) + } + } + + def getUUA(input: SpendServerEvent): Option[UnifiedUserAction] = { + val userIdentifier: UserIdentifier = + UserIdentifier( + userId = input.engagementEvent.flatMap(e => e.clientInfo.flatMap(_.userId64)), + guestIdMarketing = input.engagementEvent.flatMap(e => e.clientInfo.flatMap(_.guestId)), + ) + + getItem(input).map { item => + UnifiedUserAction( + userIdentifier = userIdentifier, + item = item, + actionType = actionType, + eventMetadata = getEventMetadata(input), + ) + } + } + + protected def getEventMetadata(input: SpendServerEvent): EventMetadata = + EventMetadata( + sourceTimestampMs = input.engagementEvent + .map { e => e.engagementEpochTimeMilliSec }.getOrElse(AdapterUtils.currentTimestampMs), + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerAdsCallbackEngagements, + language = input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.languageCode) }, + countryCode = input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.countryCode) }, + clientAppId = + input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.clientId) }.map { _.toLong }, + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseTrendAdsCallbackEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseTrendAdsCallbackEngagement.scala new file mode 100644 index 000000000..494e2ba10 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseTrendAdsCallbackEngagement.scala @@ -0,0 +1,18 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseTrendAdsCallbackEngagement(actionType: ActionType) + extends BaseAdsCallbackEngagement(actionType = actionType) { + + override protected def getItem(input: SpendServerEvent): Option[Item] = { + input.engagementEvent.flatMap { e => + e.impressionData.flatMap { i => + i.promotedTrendId.map { promotedTrendId => + Item.TrendInfo(TrendInfo(actionTrendId = promotedTrendId)) + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseVideoAdsCallbackEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseVideoAdsCallbackEngagement.scala new file mode 100644 index 000000000..8fead0888 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/BaseVideoAdsCallbackEngagement.scala @@ -0,0 +1,54 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.TweetVideoWatch +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.TweetActionInfo +import com.twitter.unified_user_actions.thriftscala.TweetInfo + +abstract class BaseVideoAdsCallbackEngagement(actionType: ActionType) + extends BaseAdsCallbackEngagement(actionType = actionType) { + + override def getItem(input: SpendServerEvent): Option[Item] = { + input.engagementEvent.flatMap { e => + e.impressionData.flatMap { i => + getTweetInfo(i.promotedTweetId, i.organicTweetId, i.advertiserId, input) + } + } + } + + private def getTweetInfo( + promotedTweetId: Option[Long], + organicTweetId: Option[Long], + advertiserId: Long, + input: SpendServerEvent + ): Option[Item] = { + val actionedTweetIdOpt: Option[Long] = + if (promotedTweetId.isEmpty) organicTweetId else promotedTweetId + actionedTweetIdOpt.map { actionTweetId => + Item.TweetInfo( + TweetInfo( + actionTweetId = actionTweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(advertiserId))), + tweetActionInfo = Some( + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + isMonetizable = Some(true), + videoOwnerId = input.engagementEvent + .flatMap(e => e.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.videoOwnerId), + videoUuid = input.engagementEvent + .flatMap(_.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.videoUuid), + prerollOwnerId = input.engagementEvent + .flatMap(e => e.cardEngagement).flatMap(_.amplifyDetails).flatMap( + _.prerollOwnerId), + prerollUuid = input.engagementEvent + .flatMap(_.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.prerollUuid) + )) + ) + ), + ) + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/EngagementTypeMappings.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/EngagementTypeMappings.scala new file mode 100644 index 000000000..9700a1ef1 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/EngagementTypeMappings.scala @@ -0,0 +1,69 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.eventstream.thriftscala.EngagementEvent +import com.twitter.adserver.thriftscala.EngagementType +import com.twitter.unified_user_actions.adapter.ads_callback_engagements.AdsCallbackEngagement._ + +object EngagementTypeMappings { + + /** + * Ads could be Tweets or non-Tweets. Since UUA explicitly sets the item type, it is + * possible that one Ads Callback engagement type maps to multiple UUA action types. + */ + def getEngagementMappings( + engagementEvent: Option[EngagementEvent] + ): Seq[BaseAdsCallbackEngagement] = { + val promotedTweetId: Option[Long] = + engagementEvent.flatMap(_.impressionData).flatMap(_.promotedTweetId) + engagementEvent + .map(event => + event.engagementType match { + case EngagementType.Fav => Seq(PromotedTweetFav) + case EngagementType.Unfav => Seq(PromotedTweetUnfav) + case EngagementType.Reply => Seq(PromotedTweetReply) + case EngagementType.Retweet => Seq(PromotedTweetRetweet) + case EngagementType.Block => Seq(PromotedTweetBlockAuthor) + case EngagementType.Unblock => Seq(PromotedTweetUnblockAuthor) + case EngagementType.Send => Seq(PromotedTweetComposeTweet) + case EngagementType.Detail => Seq(PromotedTweetClick) + case EngagementType.Report => Seq(PromotedTweetReport) + case EngagementType.Follow => Seq(PromotedProfileFollow) + case EngagementType.Unfollow => Seq(PromotedProfileUnfollow) + case EngagementType.Mute => Seq(PromotedTweetMuteAuthor) + case EngagementType.ProfilePic => Seq(PromotedTweetClickProfile) + case EngagementType.ScreenName => Seq(PromotedTweetClickProfile) + case EngagementType.UserName => Seq(PromotedTweetClickProfile) + case EngagementType.Hashtag => Seq(PromotedTweetClickHashtag) + case EngagementType.Url => Seq(PromotedTweetOpenLink) + case EngagementType.CarouselSwipeNext => Seq(PromotedTweetCarouselSwipeNext) + case EngagementType.CarouselSwipePrevious => Seq(PromotedTweetCarouselSwipePrevious) + case EngagementType.DwellShort => Seq(PromotedTweetLingerImpressionShort) + case EngagementType.DwellMedium => Seq(PromotedTweetLingerImpressionMedium) + case EngagementType.DwellLong => Seq(PromotedTweetLingerImpressionLong) + case EngagementType.SpotlightClick => Seq(PromotedTweetClickSpotlight) + case EngagementType.SpotlightView => Seq(PromotedTweetViewSpotlight) + case EngagementType.TrendView => Seq(PromotedTrendView) + case EngagementType.TrendClick => Seq(PromotedTrendClick) + case EngagementType.VideoContentPlayback25 => Seq(PromotedTweetVideoPlayback25) + case EngagementType.VideoContentPlayback50 => Seq(PromotedTweetVideoPlayback50) + case EngagementType.VideoContentPlayback75 => Seq(PromotedTweetVideoPlayback75) + case EngagementType.VideoAdPlayback25 if promotedTweetId.isDefined => + Seq(PromotedTweetVideoAdPlayback25) + case EngagementType.VideoAdPlayback25 if promotedTweetId.isEmpty => + Seq(TweetVideoAdPlayback25) + case EngagementType.VideoAdPlayback50 if promotedTweetId.isDefined => + Seq(PromotedTweetVideoAdPlayback50) + case EngagementType.VideoAdPlayback50 if promotedTweetId.isEmpty => + Seq(TweetVideoAdPlayback50) + case EngagementType.VideoAdPlayback75 if promotedTweetId.isDefined => + Seq(PromotedTweetVideoAdPlayback75) + case EngagementType.VideoAdPlayback75 if promotedTweetId.isEmpty => + Seq(TweetVideoAdPlayback75) + case EngagementType.DismissRepetitive => Seq(PromotedTweetDismissRepetitive) + case EngagementType.DismissSpam => Seq(PromotedTweetDismissSpam) + case EngagementType.DismissUninteresting => Seq(PromotedTweetDismissUninteresting) + case EngagementType.DismissWithoutReason => Seq(PromotedTweetDismissWithoutReason) + case _ => Nil + }).toSeq.flatten + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/ProfileAdsCallbackEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/ProfileAdsCallbackEngagement.scala new file mode 100644 index 000000000..86633d3db --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements/ProfileAdsCallbackEngagement.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.adapter.ads_callback_engagements + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProfileInfo + +abstract class ProfileAdsCallbackEngagement(actionType: ActionType) + extends BaseAdsCallbackEngagement(actionType) { + + override protected def getItem(input: SpendServerEvent): Option[Item] = { + input.engagementEvent.flatMap { e => + e.impressionData.flatMap { i => + getProfileInfo(i.advertiserId) + } + } + } + + protected def getProfileInfo(advertiserId: Long): Option[Item] = { + Some( + Item.ProfileInfo( + ProfileInfo( + actionProfileId = advertiserId + ))) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BUILD new file mode 100644 index 000000000..e8f741e78 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BUILD @@ -0,0 +1,16 @@ +scala_library( + sources = [ + "*.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + "common-internal/analytics/client-analytics-data-layer/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/scala/com/twitter/loggedout/analytics/common", + "src/thrift/com/twitter/clientapp/gen:clientapp-scala", + "twadoop_config/configuration/log_categories/group/scribelib:client_event-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCTAClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCTAClientEvent.scala new file mode 100644 index 000000000..d1a47db26 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCTAClientEvent.scala @@ -0,0 +1,46 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} + +abstract class BaseCTAClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = { + val logBase: Option[LogBase] = logEvent.logBase + val userIdentifier: UserIdentifier = UserIdentifier( + userId = logBase.flatMap(_.userId), + guestIdMarketing = logBase.flatMap(_.guestIdMarketing)) + val uuaItem: Item = Item.CtaInfo(CTAInfo()) + val eventTimestamp = logBase.flatMap(getSourceTimestamp).getOrElse(0L) + val ceItem = LogEventItem.unsafeEmpty + + val productSurface: Option[ProductSurface] = ProductSurfaceUtils + .getProductSurface(logEvent.eventNamespace) + + val eventMetaData: EventMetadata = ClientEventCommonUtils + .getEventMetadata( + eventTimestamp = eventTimestamp, + logEvent = logEvent, + ceItem = ceItem, + productSurface = productSurface + ) + + Seq( + UnifiedUserAction( + userIdentifier = userIdentifier, + item = uuaItem, + actionType = actionType, + eventMetadata = eventMetaData, + productSurface = productSurface, + productSurfaceInfo = + ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent) + )) + } + +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCardClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCardClientEvent.scala new file mode 100644 index 000000000..63235304e --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseCardClientEvent.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.CardInfo +import com.twitter.unified_user_actions.thriftscala.Item + +abstract class BaseCardClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.ignoreItemType(itemTypeOpt) + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = Some( + Item.CardInfo( + CardInfo( + id = ceItem.id, + itemType = ceItem.itemType, + actionTweetAuthorInfo = ClientEventCommonUtils.getAuthorInfo(ceItem), + )) + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseClientEvent.scala new file mode 100644 index 000000000..a2df60aab --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseClientEvent.scala @@ -0,0 +1,68 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.logbase.thriftscala.ClientEventReceiver +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseClientEvent(actionType: ActionType) { + def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = { + val logBase: Option[LogBase] = logEvent.logBase + + for { + ed <- logEvent.eventDetails.toSeq + items <- ed.items.toSeq + ceItem <- items + eventTimestamp <- logBase.flatMap(getSourceTimestamp) + uuaItem <- getUuaItem(ceItem, logEvent) + if isItemTypeValid(ceItem.itemType) + } yield { + val userIdentifier: UserIdentifier = UserIdentifier( + userId = logBase.flatMap(_.userId), + guestIdMarketing = logBase.flatMap(_.guestIdMarketing)) + + val productSurface: Option[ProductSurface] = ProductSurfaceUtils + .getProductSurface(logEvent.eventNamespace) + + val eventMetaData: EventMetadata = ClientEventCommonUtils + .getEventMetadata( + eventTimestamp = eventTimestamp, + logEvent = logEvent, + ceItem = ceItem, + productSurface = productSurface + ) + + UnifiedUserAction( + userIdentifier = userIdentifier, + item = uuaItem, + actionType = actionType, + eventMetadata = eventMetaData, + productSurface = productSurface, + productSurfaceInfo = + ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent) + ) + } + } + + def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for (actionTweetId <- ceItem.id) + yield Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo(actionTweetId, ceItem, logEvent.eventNamespace)) + + // default implementation filters items of type tweet + // override in the subclass implementation to filter items of other types + def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.isItemTypeTweet(itemTypeOpt) + + def getSourceTimestamp(logBase: LogBase): Option[Long] = + logBase.clientEventReceiver match { + case Some(ClientEventReceiver.CesHttp) | Some(ClientEventReceiver.CesThrift) => + logBase.driftAdjustedEventCreatedAtMs + case _ => Some(logBase.driftAdjustedEventCreatedAtMs.getOrElse(logBase.timestamp)) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseFeedbackSubmitClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseFeedbackSubmitClientEvent.scala new file mode 100644 index 000000000..83388bd0d --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseFeedbackSubmitClientEvent.scala @@ -0,0 +1,46 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseFeedbackSubmitClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = { + logEvent.eventNamespace.flatMap(_.page).flatMap { + case "search" => + val searchInfoUtil = new SearchInfoUtils(ceItem) + searchInfoUtil.getQueryOptFromItem(logEvent).flatMap { query => + val isRelevant: Boolean = logEvent.eventNamespace + .flatMap(_.element) + .contains("is_relevant") + logEvent.eventNamespace.flatMap(_.component).flatMap { + case "relevance_prompt_module" => + for (actionTweetId <- ceItem.id) + yield Item.FeedbackPromptInfo( + FeedbackPromptInfo( + feedbackPromptActionInfo = FeedbackPromptActionInfo.TweetRelevantToSearch( + TweetRelevantToSearch( + searchQuery = query, + tweetId = actionTweetId, + isRelevant = Some(isRelevant))))) + case "did_you_find_it_module" => + Some( + Item.FeedbackPromptInfo(FeedbackPromptInfo(feedbackPromptActionInfo = + FeedbackPromptActionInfo.DidYouFindItSearch( + DidYouFindItSearch(searchQuery = query, isRelevant = Some(isRelevant)))))) + } + } + case _ => None + } + + } + + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.isItemTypeForSearchResultsPageFeedbackSubmit(itemTypeOpt) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseNotificationTabClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseNotificationTabClientEvent.scala new file mode 100644 index 000000000..37737f017 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseNotificationTabClientEvent.scala @@ -0,0 +1,48 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseNotificationTabClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + // itemType is `None` for Notification Tab events + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.ignoreItemType(itemTypeOpt) + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for { + notificationTabDetails <- ceItem.notificationTabDetails + clientEventMetadata <- notificationTabDetails.clientEventMetadata + notificationId <- NotificationClientEventUtils.getNotificationIdForNotificationTab(ceItem) + } yield { + clientEventMetadata.tweetIds match { + // if `tweetIds` contain more than one Tweet id, create `MultiTweetNotification` + case Some(tweetIds) if tweetIds.size > 1 => + Item.NotificationInfo( + NotificationInfo( + actionNotificationId = notificationId, + content = NotificationContent.MultiTweetNotification( + MultiTweetNotification(tweetIds = tweetIds)) + )) + // if `tweetIds` contain exactly one Tweet id, create `TweetNotification` + case Some(tweetIds) if tweetIds.size == 1 => + Item.NotificationInfo( + NotificationInfo( + actionNotificationId = notificationId, + content = + NotificationContent.TweetNotification(TweetNotification(tweetId = tweetIds.head)))) + // if `tweetIds` are missing, create `UnknownNotification` + case _ => + Item.NotificationInfo( + NotificationInfo( + actionNotificationId = notificationId, + content = NotificationContent.UnknownNotification(UnknownNotification()) + )) + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseProfileClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseProfileClientEvent.scala new file mode 100644 index 000000000..35e122dcd --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseProfileClientEvent.scala @@ -0,0 +1,25 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.adapter.client_event.ClientEventCommonUtils.getProfileIdFromUserItem +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProfileInfo + +abstract class BaseProfileClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.isItemTypeProfile(itemTypeOpt) + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + getProfileIdFromUserItem(ceItem).map { id => + Item.ProfileInfo( + ProfileInfo(actionProfileId = id) + ) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BasePushNotificationClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BasePushNotificationClientEvent.scala new file mode 100644 index 000000000..be3af9dde --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BasePushNotificationClientEvent.scala @@ -0,0 +1,22 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BasePushNotificationClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for { + itemId <- ceItem.id + notificationId <- NotificationClientEventUtils.getNotificationIdForPushNotification(logEvent) + } yield { + Item.NotificationInfo( + NotificationInfo( + actionNotificationId = notificationId, + content = NotificationContent.TweetNotification(TweetNotification(tweetId = itemId)))) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseSearchTypeaheadEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseSearchTypeaheadEvent.scala new file mode 100644 index 000000000..b00745d7f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseSearchTypeaheadEvent.scala @@ -0,0 +1,87 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.adapter.client_event.ClientEventCommonUtils.getProfileIdFromUserItem +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProductSurface +import com.twitter.unified_user_actions.thriftscala.TopicQueryResult +import com.twitter.unified_user_actions.thriftscala.TypeaheadActionInfo +import com.twitter.unified_user_actions.thriftscala.TypeaheadInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier +import com.twitter.unified_user_actions.thriftscala.UserResult + +abstract class BaseSearchTypeaheadEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = { + val logBase: Option[LogBase] = logEvent.logBase + + for { + ed <- logEvent.eventDetails.toSeq + targets <- ed.targets.toSeq + ceTarget <- targets + eventTimestamp <- logBase.flatMap(getSourceTimestamp) + uuaItem <- getUuaItem(ceTarget, logEvent) + if isItemTypeValid(ceTarget.itemType) + } yield { + val userIdentifier: UserIdentifier = UserIdentifier( + userId = logBase.flatMap(_.userId), + guestIdMarketing = logBase.flatMap(_.guestIdMarketing)) + + val productSurface: Option[ProductSurface] = ProductSurfaceUtils + .getProductSurface(logEvent.eventNamespace) + + val eventMetaData: EventMetadata = ClientEventCommonUtils + .getEventMetadata( + eventTimestamp = eventTimestamp, + logEvent = logEvent, + ceItem = ceTarget, + productSurface = productSurface + ) + + UnifiedUserAction( + userIdentifier = userIdentifier, + item = uuaItem, + actionType = actionType, + eventMetadata = eventMetaData, + productSurface = productSurface, + productSurfaceInfo = + ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceTarget, logEvent) + ) + } + } + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.isItemTypeTypeaheadResult(itemTypeOpt) + + override def getUuaItem( + ceTarget: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + logEvent.searchDetails.flatMap(_.query).flatMap { query => + ceTarget.itemType match { + case Some(ItemType.User) => + getProfileIdFromUserItem(ceTarget).map { profileId => + Item.TypeaheadInfo( + TypeaheadInfo( + actionQuery = query, + typeaheadActionInfo = + TypeaheadActionInfo.UserResult(UserResult(profileId = profileId)))) + } + case Some(ItemType.Search) => + ceTarget.name.map { name => + Item.TypeaheadInfo( + TypeaheadInfo( + actionQuery = query, + typeaheadActionInfo = TypeaheadActionInfo.TopicQueryResult( + TopicQueryResult(suggestedTopicQuery = name)))) + } + case _ => None + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseTopicClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseTopicClientEvent.scala new file mode 100644 index 000000000..b74a56ace --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseTopicClientEvent.scala @@ -0,0 +1,23 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.TopicInfo + +abstract class BaseTopicClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean = + ItemTypeFilterPredicates.isItemTypeTopic(itemTypeOpt) + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + for (actionTopicId <- ClientEventCommonUtils.getTopicId( + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace)) + yield Item.TopicInfo(TopicInfo(actionTopicId = actionTopicId)) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseUASClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseUASClientEvent.scala new file mode 100644 index 000000000..de16de786 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseUASClientEvent.scala @@ -0,0 +1,62 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseUASClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = { + val logBase: Option[LogBase] = logEvent.logBase + val ceItem = LogEventItem.unsafeEmpty + + val uuaOpt: Option[UnifiedUserAction] = for { + eventTimestamp <- logBase.flatMap(getSourceTimestamp) + uuaItem <- getUuaItem(ceItem, logEvent) + } yield { + val userIdentifier: UserIdentifier = UserIdentifier( + userId = logBase.flatMap(_.userId), + guestIdMarketing = logBase.flatMap(_.guestIdMarketing)) + + val productSurface: Option[ProductSurface] = ProductSurfaceUtils + .getProductSurface(logEvent.eventNamespace) + + val eventMetaData: EventMetadata = ClientEventCommonUtils + .getEventMetadata( + eventTimestamp = eventTimestamp, + logEvent = logEvent, + ceItem = ceItem, + productSurface = productSurface + ) + + UnifiedUserAction( + userIdentifier = userIdentifier, + item = uuaItem, + actionType = actionType, + eventMetadata = eventMetaData, + productSurface = productSurface, + productSurfaceInfo = + ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent) + ) + } + + uuaOpt match { + case Some(uua) => Seq(uua) + case _ => Nil + } + } + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for { + performanceDetails <- logEvent.performanceDetails + duration <- performanceDetails.durationMs + } yield { + Item.UasInfo(UASInfo(timeSpentMs = duration)) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseVideoClientEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseVideoClientEvent.scala new file mode 100644 index 000000000..7d6cdbb2e --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/BaseVideoClientEvent.scala @@ -0,0 +1,34 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala._ + +abstract class BaseVideoClientEvent(actionType: ActionType) + extends BaseClientEvent(actionType = actionType) { + + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for { + actionTweetId <- ceItem.id + clientMediaEvent <- ceItem.clientMediaEvent + sessionState <- clientMediaEvent.sessionState + mediaIdentifier <- sessionState.contentVideoIdentifier + mediaId <- VideoClientEventUtils.videoIdFromMediaIdentifier(mediaIdentifier) + mediaDetails <- ceItem.mediaDetailsV2 + mediaItems <- mediaDetails.mediaItems + videoMetadata <- VideoClientEventUtils.getVideoMetadata( + mediaId, + mediaItems, + ceItem.cardDetails.flatMap(_.amplifyDetails)) + } yield { + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some(videoMetadata))) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventAdapter.scala new file mode 100644 index 000000000..3bfde0c36 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventAdapter.scala @@ -0,0 +1,272 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.client_event.ClientEventImpression._ +import com.twitter.unified_user_actions.adapter.client_event.ClientEventEngagement._ +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import scala.util.matching.Regex + +class ClientEventAdapter extends AbstractAdapter[LogEvent, UnKeyed, UnifiedUserAction] { + import ClientEventAdapter._ + + override def adaptOneToKeyedMany( + input: LogEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object ClientEventAdapter { + // Refer to go/cme-scribing and go/interaction-event-spec for details + def isVideoEvent(element: String): Boolean = Seq[String]( + "gif_player", + "periscope_player", + "platform_amplify_card", + "video_player", + "vine_player").contains(element) + + /** + * Tweet clicks on the Notification Tab on iOS are a special case because the `element` is different + * from Tweet clicks everywhere else on the platform. + * + * For Notification Tab on iOS, `element` could be one of `user_mentioned_you`, + * `user_mentioned_you_in_a_quote_tweet`, `user_replied_to_your_tweet`, or `user_quoted_your_tweet`. + * + * In other places, `element` = `tweet`. + */ + def isTweetClickEvent(element: String): Boolean = + Seq[String]( + "tweet", + "user_mentioned_you", + "user_mentioned_you_in_a_quote_tweet", + "user_replied_to_your_tweet", + "user_quoted_your_tweet" + ).contains(element) + + final val validUASIosClientIds = Seq[Long]( + 129032L, // Twitter for iPhone + 191841L // Twitter for iPad + ) + // Twitter for Android + final val validUASAndroidClientIds = Seq[Long](258901L) + + def adaptEvent(inputLogEvent: LogEvent): Seq[UnifiedUserAction] = + Option(inputLogEvent).toSeq + .filterNot { logEvent: LogEvent => + shouldIgnoreClientEvent(logEvent.eventNamespace) + } + .flatMap { logEvent: LogEvent => + val actionTypesPerEvent: Seq[BaseClientEvent] = logEvent.eventNamespace.toSeq.flatMap { + name => + (name.page, name.section, name.component, name.element, name.action) match { + case (_, _, _, _, Some("favorite")) => Seq(TweetFav) + case (_, _, _, _, Some("unfavorite")) => Seq(TweetUnfav) + case (_, _, Some("stream"), Some("linger"), Some("results")) => + Seq(TweetLingerImpression) + case (_, _, Some("stream"), None, Some("results")) => + Seq(TweetRenderImpression) + case (_, _, _, _, Some("send_reply")) => Seq(TweetReply) + // Different clients may have different actions of the same "send quote" + // but it turns out that both send_quote and retweet_with_comment should correspond to + // "send quote" + case (_, _, _, _, Some("send_quote_tweet")) | + (_, _, _, _, Some("retweet_with_comment")) => + Seq(TweetQuote) + case (_, _, _, _, Some("retweet")) => Seq(TweetRetweet) + case (_, _, _, _, Some("unretweet")) => Seq(TweetUnretweet) + case (_, _, _, _, Some("reply")) => Seq(TweetClickReply) + case (_, _, _, _, Some("quote")) => Seq(TweetClickQuote) + case (_, _, _, Some(element), Some("playback_start")) if isVideoEvent(element) => + Seq(TweetVideoPlaybackStart) + case (_, _, _, Some(element), Some("playback_complete")) if isVideoEvent(element) => + Seq(TweetVideoPlaybackComplete) + case (_, _, _, Some(element), Some("playback_25")) if isVideoEvent(element) => + Seq(TweetVideoPlayback25) + case (_, _, _, Some(element), Some("playback_50")) if isVideoEvent(element) => + Seq(TweetVideoPlayback50) + case (_, _, _, Some(element), Some("playback_75")) if isVideoEvent(element) => + Seq(TweetVideoPlayback75) + case (_, _, _, Some(element), Some("playback_95")) if isVideoEvent(element) => + Seq(TweetVideoPlayback95) + case (_, _, _, Some(element), Some("play_from_tap")) if isVideoEvent(element) => + Seq(TweetVideoPlayFromTap) + case (_, _, _, Some(element), Some("video_quality_view")) if isVideoEvent(element) => + Seq(TweetVideoQualityView) + case (_, _, _, Some(element), Some("video_view")) if isVideoEvent(element) => + Seq(TweetVideoView) + case (_, _, _, Some(element), Some("video_mrc_view")) if isVideoEvent(element) => + Seq(TweetVideoMrcView) + case (_, _, _, Some(element), Some("view_threshold")) if isVideoEvent(element) => + Seq(TweetVideoViewThreshold) + case (_, _, _, Some(element), Some("cta_url_click")) if isVideoEvent(element) => + Seq(TweetVideoCtaUrlClick) + case (_, _, _, Some(element), Some("cta_watch_click")) if isVideoEvent(element) => + Seq(TweetVideoCtaWatchClick) + case (_, _, _, Some("platform_photo_card"), Some("click")) => Seq(TweetPhotoExpand) + case (_, _, _, Some("platform_card"), Some("click")) => Seq(CardClick) + case (_, _, _, _, Some("open_app")) => Seq(CardOpenApp) + case (_, _, _, _, Some("install_app")) => Seq(CardAppInstallAttempt) + case (_, _, _, Some("platform_card"), Some("vote")) | + (_, _, _, Some("platform_forward_card"), Some("vote")) => + Seq(PollCardVote) + case (_, _, _, Some("mention"), Some("click")) | + (_, _, _, _, Some("mention_click")) => + Seq(TweetClickMentionScreenName) + case (_, _, _, Some(element), Some("click")) if isTweetClickEvent(element) => + Seq(TweetClick) + case // Follow from the Topic page (or so-called landing page) + (_, _, _, Some("topic"), Some("follow")) | + // Actually not sure how this is generated ... but saw quite some events in BQ + (_, _, _, Some("social_proof"), Some("follow")) | + // Click on Tweet's caret menu of "Follow (the topic)", it needs to be: + // 1) user follows the Topic already, 2) and clicked on the "Unfollow Topic" first. + (_, _, _, Some("feedback_follow_topic"), Some("click")) => + Seq(TopicFollow) + case (_, _, _, Some("topic"), Some("unfollow")) | + (_, _, _, Some("social_proof"), Some("unfollow")) | + (_, _, _, Some("feedback_unfollow_topic"), Some("click")) => + Seq(TopicUnfollow) + case (_, _, _, Some("topic"), Some("not_interested")) | + (_, _, _, Some("feedback_not_interested_in_topic"), Some("click")) => + Seq(TopicNotInterestedIn) + case (_, _, _, Some("topic"), Some("un_not_interested")) | + (_, _, _, Some("feedback_not_interested_in_topic"), Some("undo")) => + Seq(TopicUndoNotInterestedIn) + case (_, _, _, Some("feedback_givefeedback"), Some("click")) => + Seq(TweetNotHelpful) + case (_, _, _, Some("feedback_givefeedback"), Some("undo")) => + Seq(TweetUndoNotHelpful) + case (_, _, _, Some("report_tweet"), Some("click")) | + (_, _, _, Some("report_tweet"), Some("done")) => + Seq(TweetReport) + case (_, _, _, Some("feedback_dontlike"), Some("click")) => + Seq(TweetNotInterestedIn) + case (_, _, _, Some("feedback_dontlike"), Some("undo")) => + Seq(TweetUndoNotInterestedIn) + case (_, _, _, Some("feedback_notabouttopic"), Some("click")) => + Seq(TweetNotAboutTopic) + case (_, _, _, Some("feedback_notabouttopic"), Some("undo")) => + Seq(TweetUndoNotAboutTopic) + case (_, _, _, Some("feedback_notrecent"), Some("click")) => + Seq(TweetNotRecent) + case (_, _, _, Some("feedback_notrecent"), Some("undo")) => + Seq(TweetUndoNotRecent) + case (_, _, _, Some("feedback_seefewer"), Some("click")) => + Seq(TweetSeeFewer) + case (_, _, _, Some("feedback_seefewer"), Some("undo")) => + Seq(TweetUndoSeeFewer) + // Only when action = "submit" we get all fields in ReportDetails, such as reportType + // See https://confluence.twitter.biz/pages/viewpage.action?spaceKey=HEALTH&title=Understanding+ReportDetails + case (Some(page), _, _, Some("ticket"), Some("submit")) + if page.startsWith("report_") => + Seq(TweetReportServer) + case (Some("profile"), _, _, _, Some("block")) => + Seq(ProfileBlock) + case (Some("profile"), _, _, _, Some("unblock")) => + Seq(ProfileUnblock) + case (Some("profile"), _, _, _, Some("mute_user")) => + Seq(ProfileMute) + case (Some("profile"), _, _, _, Some("report")) => + Seq(ProfileReport) + case (Some("profile"), _, _, _, Some("show")) => + Seq(ProfileShow) + case (_, _, _, Some("follow"), Some("click")) => Seq(TweetFollowAuthor) + case (_, _, _, _, Some("follow")) => Seq(TweetFollowAuthor, ProfileFollow) + case (_, _, _, Some("unfollow"), Some("click")) => Seq(TweetUnfollowAuthor) + case (_, _, _, _, Some("unfollow")) => Seq(TweetUnfollowAuthor) + case (_, _, _, Some("block"), Some("click")) => Seq(TweetBlockAuthor) + case (_, _, _, Some("unblock"), Some("click")) => Seq(TweetUnblockAuthor) + case (_, _, _, Some("mute"), Some("click")) => Seq(TweetMuteAuthor) + case (_, _, _, Some(element), Some("click")) if isTweetClickEvent(element) => + Seq(TweetClick) + case (_, _, _, _, Some("profile_click")) => Seq(TweetClickProfile, ProfileClick) + case (_, _, _, _, Some("share_menu_click")) => Seq(TweetClickShare) + case (_, _, _, _, Some("copy_link")) => Seq(TweetShareViaCopyLink) + case (_, _, _, _, Some("share_via_dm")) => Seq(TweetClickSendViaDirectMessage) + case (_, _, _, _, Some("bookmark")) => Seq(TweetShareViaBookmark, TweetBookmark) + case (_, _, _, _, Some("unbookmark")) => Seq(TweetUnbookmark) + case (_, _, _, _, Some("hashtag_click")) | + // This scribe is triggered on mobile platforms (android/iphone) when user click on hashtag in a tweet. + (_, _, _, Some("hashtag"), Some("search")) => + Seq(TweetClickHashtag) + case (_, _, _, _, Some("open_link")) => Seq(TweetOpenLink) + case (_, _, _, _, Some("take_screenshot")) => Seq(TweetTakeScreenshot) + case (_, _, _, Some("feedback_notrelevant"), Some("click")) => + Seq(TweetNotRelevant) + case (_, _, _, Some("feedback_notrelevant"), Some("undo")) => + Seq(TweetUndoNotRelevant) + case (_, _, _, _, Some("follow_attempt")) => Seq(ProfileFollowAttempt) + case (_, _, _, _, Some("favorite_attempt")) => Seq(TweetFavoriteAttempt) + case (_, _, _, _, Some("retweet_attempt")) => Seq(TweetRetweetAttempt) + case (_, _, _, _, Some("reply_attempt")) => Seq(TweetReplyAttempt) + case (_, _, _, _, Some("login")) => Seq(CTALoginClick) + case (Some("login"), _, _, _, Some("show")) => Seq(CTALoginStart) + case (Some("login"), _, _, _, Some("success")) => Seq(CTALoginSuccess) + case (_, _, _, _, Some("signup")) => Seq(CTASignupClick) + case (Some("signup"), _, _, _, Some("success")) => Seq(CTASignupSuccess) + case // Android app running in the background + (Some("notification"), Some("status_bar"), None, _, Some("background_open")) | + // Android app running in the foreground + (Some("notification"), Some("status_bar"), None, _, Some("open")) | + // iOS app running in the background + (Some("notification"), Some("notification_center"), None, _, Some("open")) | + // iOS app running in the foreground + (None, Some("toasts"), Some("social"), Some("favorite"), Some("open")) | + // m5 + (Some("app"), Some("push"), _, _, Some("open")) => + Seq(NotificationOpen) + case (Some("ntab"), Some("all"), Some("urt"), _, Some("navigate")) => + Seq(NotificationClick) + case (Some("ntab"), Some("all"), Some("urt"), _, Some("see_less_often")) => + Seq(NotificationSeeLessOften) + case (Some("notification"), Some("status_bar"), None, _, Some("background_dismiss")) | + (Some("notification"), Some("status_bar"), None, _, Some("dismiss")) | ( + Some("notification"), + Some("notification_center"), + None, + _, + Some("dismiss") + ) => + Seq(NotificationDismiss) + case (_, _, _, Some("typeahead"), Some("click")) => Seq(TypeaheadClick) + case (Some("search"), _, Some(component), _, Some("click")) + if component == "relevance_prompt_module" || component == "did_you_find_it_module" => + Seq(FeedbackPromptSubmit) + case (Some("app"), Some("enter_background"), _, _, Some("become_inactive")) + if logEvent.logBase + .flatMap(_.clientAppId) + .exists(validUASIosClientIds.contains(_)) => + Seq(AppExit) + case (Some("app"), _, _, _, Some("become_inactive")) + if logEvent.logBase + .flatMap(_.clientAppId) + .exists(validUASAndroidClientIds.contains(_)) => + Seq(AppExit) + case (_, _, Some("gallery"), Some("photo"), Some("impression")) => + Seq(TweetGalleryImpression) + case (_, _, _, _, _) + if TweetDetailsImpression.isTweetDetailsImpression(logEvent.eventNamespace) => + Seq(TweetDetailsImpression) + case _ => Nil + } + } + actionTypesPerEvent.map(_.toUnifiedUserAction(logEvent)) + }.flatten + + def shouldIgnoreClientEvent(eventNamespace: Option[EventNamespace]): Boolean = + eventNamespace.exists { name => + (name.page, name.section, name.component, name.element, name.action) match { + case (Some("ddg"), _, _, _, Some("experiment")) => true + case (Some("qig_ranker"), _, _, _, _) => true + case (Some("timelinemixer"), _, _, _, _) => true + case (Some("timelineservice"), _, _, _, _) => true + case (Some("tweetconvosvc"), _, _, _, _) => true + case _ => false + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventCommonUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventCommonUtils.scala new file mode 100644 index 000000000..f81060ad9 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventCommonUtils.scala @@ -0,0 +1,169 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.Item +import com.twitter.clientapp.thriftscala.ItemType.User +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.ClientEventNamespace +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.ProductSurface +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.TweetAuthorFollowClickSource +import com.twitter.unified_user_actions.thriftscala.TweetAuthorUnfollowClickSource +import com.twitter.unified_user_actions.thriftscala.TweetInfo + +/** + * Comprises helper methods that: + * 1. need not be overridden by subclasses of `BaseClientEvent` + * 2. need not be invoked by instances of subclasses of `BaseClientEvent` + * 3. need to be accessible to subclasses of `BaseClientEvent` and other utils + */ +object ClientEventCommonUtils { + + def getBasicTweetInfo( + actionTweetId: Long, + ceItem: LogEventItem, + ceNamespaceOpt: Option[EventNamespace] + ): TweetInfo = TweetInfo( + actionTweetId = actionTweetId, + actionTweetTopicSocialProofId = getTopicId(ceItem, ceNamespaceOpt), + retweetingTweetId = ceItem.tweetDetails.flatMap(_.retweetingTweetId), + quotedTweetId = ceItem.tweetDetails.flatMap(_.quotedTweetId), + inReplyToTweetId = ceItem.tweetDetails.flatMap(_.inReplyToTweetId), + quotingTweetId = ceItem.tweetDetails.flatMap(_.quotingTweetId), + // only set AuthorInfo when authorId is present + actionTweetAuthorInfo = getAuthorInfo(ceItem), + retweetingAuthorId = ceItem.tweetDetails.flatMap(_.retweetAuthorId), + quotedAuthorId = ceItem.tweetDetails.flatMap(_.quotedAuthorId), + inReplyToAuthorId = ceItem.tweetDetails.flatMap(_.inReplyToAuthorId), + tweetPosition = ceItem.position, + promotedId = ceItem.promotedId + ) + + def getTopicId( + ceItem: LogEventItem, + ceNamespaceOpt: Option[EventNamespace] = None, + ): Option[Long] = + ceNamespaceOpt.flatMap { + TopicIdUtils.getTopicId(item = ceItem, _) + } + + def getAuthorInfo( + ceItem: LogEventItem, + ): Option[AuthorInfo] = + ceItem.tweetDetails.flatMap(_.authorId).map { authorId => + AuthorInfo( + authorId = Some(authorId), + isFollowedByActingUser = ceItem.isViewerFollowsTweetAuthor, + isFollowingActingUser = ceItem.isTweetAuthorFollowsViewer, + ) + } + + def getEventMetadata( + eventTimestamp: Long, + logEvent: LogEvent, + ceItem: LogEventItem, + productSurface: Option[ProductSurface] = None + ): EventMetadata = EventMetadata( + sourceTimestampMs = eventTimestamp, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ClientEvents, + // Client UI language or from Gizmoduck which is what user set in Twitter App. + // Please see more at https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/LanguageIdentifier.scala + // The format should be ISO 639-1. + language = logEvent.logBase.flatMap(_.language).map(AdapterUtils.normalizeLanguageCode), + // Country code could be IP address (geoduck) or User registration country (gizmoduck) and the former takes precedence. + // We don’t know exactly which one is applied, unfortunately, + // see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/CountryIdentifier.scala + // The format should be ISO_3166-1_alpha-2. + countryCode = logEvent.logBase.flatMap(_.country).map(AdapterUtils.normalizeCountryCode), + clientAppId = logEvent.logBase.flatMap(_.clientAppId), + clientVersion = logEvent.clientVersion, + clientEventNamespace = logEvent.eventNamespace.map(en => toClientEventNamespace(en)), + traceId = getTraceId(productSurface, ceItem), + requestJoinId = getRequestJoinId(productSurface, ceItem), + clientEventTriggeredOn = logEvent.eventDetails.flatMap(_.triggeredOn) + ) + + def toClientEventNamespace(eventNamespace: EventNamespace): ClientEventNamespace = + ClientEventNamespace( + page = eventNamespace.page, + section = eventNamespace.section, + component = eventNamespace.component, + element = eventNamespace.element, + action = eventNamespace.action + ) + + /** + * Get the profileId from Item.id, which itemType = 'USER'. + * + * The profileId can be also be found in the event_details.profile_id. + * However, the item.id is more reliable than event_details.profile_id, + * in particular, 45% of the client events with USER items have + * Null for event_details.profile_id while 0.13% item.id is Null. + * As such, we only use item.id to populate the profile_id. + */ + def getProfileIdFromUserItem(item: Item): Option[Long] = + if (item.itemType.contains(User)) + item.id + else None + + /** + * TraceId is going to be deprecated and replaced by requestJoinId. + * + * Get the traceId from LogEventItem based on productSurface. + * + * The traceId is hydrated in controller data from backend. Different product surfaces + * populate different controller data. Thus, the product surface is checked first to decide + * which controller data should be read to ge the requestJoinId. + */ + def getTraceId(productSurface: Option[ProductSurface], ceItem: LogEventItem): Option[Long] = + productSurface match { + case Some(ProductSurface.HomeTimeline) => HomeInfoUtils.getTraceId(ceItem) + case Some(ProductSurface.SearchResultsPage) => { new SearchInfoUtils(ceItem) }.getTraceId + case _ => None + } + + /** + * Get the requestJoinId from LogEventItem based on productSurface. + * + * The requestJoinId is hydrated in controller data from backend. Different product surfaces + * populate different controller data. Thus, the product surface is checked first to decide + * which controller data should be read to get the requestJoinId. + * + * Support Home / Home_latest / SearchResults for now, to add other surfaces based on requirement. + */ + def getRequestJoinId(productSurface: Option[ProductSurface], ceItem: LogEventItem): Option[Long] = + productSurface match { + case Some(ProductSurface.HomeTimeline) => HomeInfoUtils.getRequestJoinId(ceItem) + case Some(ProductSurface.SearchResultsPage) => { + new SearchInfoUtils(ceItem) + }.getRequestJoinId + case _ => None + } + + def getTweetAuthorFollowSource( + eventNamespace: Option[EventNamespace] + ): TweetAuthorFollowClickSource = { + eventNamespace + .map(ns => (ns.element, ns.action)).map { + case (Some("follow"), Some("click")) => TweetAuthorFollowClickSource.CaretMenu + case (_, Some("follow")) => TweetAuthorFollowClickSource.ProfileImage + case _ => TweetAuthorFollowClickSource.Unknown + }.getOrElse(TweetAuthorFollowClickSource.Unknown) + } + + def getTweetAuthorUnfollowSource( + eventNamespace: Option[EventNamespace] + ): TweetAuthorUnfollowClickSource = { + eventNamespace + .map(ns => (ns.element, ns.action)).map { + case (Some("unfollow"), Some("click")) => TweetAuthorUnfollowClickSource.CaretMenu + case (_, Some("unfollow")) => TweetAuthorUnfollowClickSource.ProfileImage + case _ => TweetAuthorUnfollowClickSource.Unknown + }.getOrElse(TweetAuthorUnfollowClickSource.Unknown) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventEngagement.scala new file mode 100644 index 000000000..0a2e59e0e --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventEngagement.scala @@ -0,0 +1,687 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.unified_user_actions.thriftscala._ + +object ClientEventEngagement { + object TweetFav extends BaseClientEvent(ActionType.ClientTweetFav) + + /** + * This is fired when a user unlikes a liked(favorited) tweet + */ + object TweetUnfav extends BaseClientEvent(ActionType.ClientTweetUnfav) + + /** + * This is "Send Reply" event to indicate publishing of a reply Tweet as opposed to clicking + * on the reply button to initiate a reply Tweet (captured in ClientTweetClickReply). + * The difference between this and the ServerTweetReply are: + * 1) ServerTweetReply already has the new Tweet Id, 2) A sent reply may be lost during transfer + * over the wire and thus may not end up with a follow-up ServerTweetReply. + */ + object TweetReply extends BaseClientEvent(ActionType.ClientTweetReply) + + /** + * This is the "send quote" event to indicate publishing of a quote tweet as opposed to clicking + * on the quote button to initiate a quote tweet (captured in ClientTweetClickQuote). + * The difference between this and the ServerTweetQuote are: + * 1) ServerTweetQuote already has the new Tweet Id, 2) A sent quote may be lost during transfer + * over the wire and thus may not ended up with a follow-up ServerTweetQuote. + */ + object TweetQuote extends BaseClientEvent(ActionType.ClientTweetQuote) + + /** + * This is the "retweet" event to indicate publishing of a retweet. + */ + object TweetRetweet extends BaseClientEvent(ActionType.ClientTweetRetweet) + + /** + * "action = reply" indicates that a user expressed the intention to reply to a Tweet by clicking + * the reply button. No new tweet is created in this event. + */ + object TweetClickReply extends BaseClientEvent(ActionType.ClientTweetClickReply) + + /** + * Please note that the "action == quote" is NOT the create quote Tweet event like what + * we can get from TweetyPie. + * It is just click on the "quote tweet" (after clicking on the retweet button there are 2 options, + * one is "retweet" and the other is "quote tweet") + * + * Also checked the CE (BQ Table), the `item.tweet_details.quoting_tweet_id` is always NULL but + * `item.tweet_details.retweeting_tweet_id`, `item.tweet_details.in_reply_to_tweet_id`, `item.tweet_details.quoted_tweet_id` + * could be NON-NULL and UUA would just include these NON-NULL fields as is. This is also checked in the unit test. + */ + object TweetClickQuote extends BaseClientEvent(ActionType.ClientTweetClickQuote) + + /** + * Refer to go/cme-scribing and go/interaction-event-spec for details. + * Fired on the first tick of a track regardless of where in the video it is playing. + * For looping playback, this is only fired once and does not reset at loop boundaries. + */ + object TweetVideoPlaybackStart + extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlaybackStart) + + /** + * Refer to go/cme-scribing and go/interaction-event-spec for details. + * Fired when playback reaches 100% of total track duration. + * Not valid for live videos. + * For looping playback, this is only fired once and does not reset at loop boundaries. + */ + object TweetVideoPlaybackComplete + extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlaybackComplete) + + /** + * Refer to go/cme-scribing and go/interaction-event-spec for details. + * This is fired when playback reaches 25% of total track duration. Not valid for live videos. + * For looping playback, this is only fired once and does not reset at loop boundaries. + */ + object TweetVideoPlayback25 extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlayback25) + object TweetVideoPlayback50 extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlayback50) + object TweetVideoPlayback75 extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlayback75) + object TweetVideoPlayback95 extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlayback95) + + /** + * Refer to go/cme-scribing and go/interaction-event-spec for details. + * This if fired when the video has been played in non-preview + * (i.e. not autoplaying in the timeline) mode, and was not started via auto-advance. + * For looping playback, this is only fired once and does not reset at loop boundaries. + */ + object TweetVideoPlayFromTap extends BaseVideoClientEvent(ActionType.ClientTweetVideoPlayFromTap) + + /** + * Refer to go/cme-scribing and go/interaction-event-spec for details. + * This is fired when 50% of the video has been on-screen and playing for 10 consecutive seconds + * or 95% of the video duration, whichever comes first. + * For looping playback, this is only fired once and does not reset at loop boundaries. + */ + object TweetVideoQualityView extends BaseVideoClientEvent(ActionType.ClientTweetVideoQualityView) + + object TweetVideoView extends BaseVideoClientEvent(ActionType.ClientTweetVideoView) + object TweetVideoMrcView extends BaseVideoClientEvent(ActionType.ClientTweetVideoMrcView) + object TweetVideoViewThreshold + extends BaseVideoClientEvent(ActionType.ClientTweetVideoViewThreshold) + object TweetVideoCtaUrlClick extends BaseVideoClientEvent(ActionType.ClientTweetVideoCtaUrlClick) + object TweetVideoCtaWatchClick + extends BaseVideoClientEvent(ActionType.ClientTweetVideoCtaWatchClick) + + /** + * This is fired when a user clicks on "Undo retweet" after re-tweeting a tweet + * + */ + object TweetUnretweet extends BaseClientEvent(ActionType.ClientTweetUnretweet) + + /** + * This is fired when a user clicks on a photo attached to a tweet and the photo expands to fit + * the screen. + */ + object TweetPhotoExpand extends BaseClientEvent(ActionType.ClientTweetPhotoExpand) + + /** + * This is fired when a user clicks on a card, a card could be a photo or video for example + */ + object CardClick extends BaseCardClientEvent(ActionType.ClientCardClick) + object CardOpenApp extends BaseCardClientEvent(ActionType.ClientCardOpenApp) + object CardAppInstallAttempt extends BaseCardClientEvent(ActionType.ClientCardAppInstallAttempt) + object PollCardVote extends BaseCardClientEvent(ActionType.ClientPollCardVote) + + /** + * This is fired when a user clicks on a profile mention inside a tweet. + */ + object TweetClickMentionScreenName + extends BaseClientEvent(ActionType.ClientTweetClickMentionScreenName) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + ( + ceItem.id, + logEvent.eventDetails.flatMap( + _.targets.flatMap(_.find(_.itemType.contains(ItemType.User))))) match { + case (Some(tweetId), Some(target)) => + (target.id, target.name) match { + case (Some(profileId), Some(profileHandle)) => + Some( + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo(tweetId, ceItem, logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetClickMentionScreenName( + ClientTweetClickMentionScreenName( + actionProfileId = profileId, + handle = profileHandle + )))))) + case _ => None + } + case _ => None + } + } + + /** + * These are fired when user follows/unfollows a Topic. Please see the comment in the + * ClientEventAdapter namespace matching to see the subtle details. + */ + object TopicFollow extends BaseTopicClientEvent(ActionType.ClientTopicFollow) + object TopicUnfollow extends BaseTopicClientEvent(ActionType.ClientTopicUnfollow) + + /** + * This is fired when the user clicks the "x" icon next to the topic on their timeline, + * and clicks "Not interested in {TOPIC}" in the pop-up prompt + * Alternatively, they can also click "See more" button to visit the topic page, and click "Not interested" there. + */ + object TopicNotInterestedIn extends BaseTopicClientEvent(ActionType.ClientTopicNotInterestedIn) + + /** + * This is fired when the user clicks the "Undo" button after clicking "x" or "Not interested" on a Topic + * which is captured in ClientTopicNotInterestedIn + */ + object TopicUndoNotInterestedIn + extends BaseTopicClientEvent(ActionType.ClientTopicUndoNotInterestedIn) + + /** + * This is fired when a user clicks on "This Tweet's not helpful" flow in the caret menu + * of a Tweet result on the Search Results Page + */ + object TweetNotHelpful extends BaseClientEvent(ActionType.ClientTweetNotHelpful) + + /** + * This is fired when a user clicks Undo after clicking on + * "This Tweet's not helpful" flow in the caret menu of a Tweet result on the Search Results Page + */ + object TweetUndoNotHelpful extends BaseClientEvent(ActionType.ClientTweetUndoNotHelpful) + + object TweetReport extends BaseClientEvent(ActionType.ClientTweetReport) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = { + for { + actionTweetId <- ceItem.id + } yield { + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetReport( + ClientTweetReport( + isReportTweetDone = + logEvent.eventNamespace.flatMap(_.action).exists(_.contains("done")), + reportFlowId = logEvent.reportDetails.flatMap(_.reportFlowId) + ) + )))) + } + } + } + + /** + * Not Interested In (Do Not like) event + */ + object TweetNotInterestedIn extends BaseClientEvent(ActionType.ClientTweetNotInterestedIn) + object TweetUndoNotInterestedIn extends BaseClientEvent(ActionType.ClientTweetUndoNotInterestedIn) + + /** + * This is fired when a user FIRST clicks the "Not interested in this Tweet" button in the caret menu of a Tweet + * then clicks "This Tweet is not about {TOPIC}" in the subsequent prompt + * Note: this button is hidden unless a user clicks "Not interested in this Tweet" first. + */ + object TweetNotAboutTopic extends BaseClientEvent(ActionType.ClientTweetNotAboutTopic) + + /** + * This is fired when a user clicks "Undo" immediately after clicking "This Tweet is not about {TOPIC}", + * which is captured in TweetNotAboutTopic + */ + object TweetUndoNotAboutTopic extends BaseClientEvent(ActionType.ClientTweetUndoNotAboutTopic) + + /** + * This is fired when a user FIRST clicks the "Not interested in this Tweet" button in the caret menu of a Tweet + * then clicks "This Tweet isn't recent" in the subsequent prompt + * Note: this button is hidden unless a user clicks "Not interested in this Tweet" first. + */ + object TweetNotRecent extends BaseClientEvent(ActionType.ClientTweetNotRecent) + + /** + * This is fired when a user clicks "Undo" immediately after clicking "his Tweet isn't recent", + * which is captured in TweetNotRecent + */ + object TweetUndoNotRecent extends BaseClientEvent(ActionType.ClientTweetUndoNotRecent) + + /** + * This is fired when a user clicks "Not interested in this Tweet" button in the caret menu of a Tweet + * then clicks "Show fewer tweets from" in the subsequent prompt + * Note: this button is hidden unless a user clicks "Not interested in this Tweet" first. + */ + object TweetSeeFewer extends BaseClientEvent(ActionType.ClientTweetSeeFewer) + + /** + * This is fired when a user clicks "Undo" immediately after clicking "Show fewer tweets from", + * which is captured in TweetSeeFewer + */ + object TweetUndoSeeFewer extends BaseClientEvent(ActionType.ClientTweetUndoSeeFewer) + + /** + * This is fired when a user click "Submit" at the end of a "Report Tweet" flow + * ClientTweetReport = 1041 is scribed by HealthClient team, on the client side + * This is scribed by spamacaw, on the server side + * They can be joined on reportFlowId + * See https://confluence.twitter.biz/pages/viewpage.action?spaceKey=HEALTH&title=Understanding+ReportDetails + */ + object TweetReportServer extends BaseClientEvent(ActionType.ServerTweetReport) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + for { + actionTweetId <- ceItem.id + } yield Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ServerTweetReport( + ServerTweetReport( + reportFlowId = logEvent.reportDetails.flatMap(_.reportFlowId), + reportType = logEvent.reportDetails.flatMap(_.reportType) + ) + )))) + } + + /** + * This is fired when a user clicks Block in a Profile page + * A Profile can also be blocked when a user clicks Block in the menu of a Tweet, which + * is captured in ClientTweetBlockAuthor + */ + object ProfileBlock extends BaseProfileClientEvent(ActionType.ClientProfileBlock) + + /** + * This is fired when a user clicks unblock in a pop-up prompt right after blocking a profile + * in the profile page or clicks unblock in a drop-down menu in the profile page. + */ + object ProfileUnblock extends BaseProfileClientEvent(ActionType.ClientProfileUnblock) + + /** + * This is fired when a user clicks Mute in a Profile page + * A Profile can also be muted when a user clicks Mute in the menu of a Tweet, which + * is captured in ClientTweetMuteAuthor + */ + object ProfileMute extends BaseProfileClientEvent(ActionType.ClientProfileMute) + + /* + * This is fired when a user clicks "Report User" action from user profile page + * */ + object ProfileReport extends BaseProfileClientEvent(ActionType.ClientProfileReport) + + // This is fired when a user profile is open in a Profile page + object ProfileShow extends BaseProfileClientEvent(ActionType.ClientProfileShow) + + object ProfileClick extends BaseProfileClientEvent(ActionType.ClientProfileClick) { + + /** + * ClientTweetClickProfile would emit 2 events, 1 with item type Tweet and 1 with item type User + * Both events will go to both actions (the actual classes). For ClientTweetClickProfile, + * item type of Tweet will filter out the event with item type User. But for ClientProfileClick, + * because we need to include item type of User, then we will also include the event of TweetClickProfile + * if we don't do anything here. This override ensures we don't include tweet author clicks events in ProfileClick + */ + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + if (logEvent.eventDetails + .flatMap(_.items).exists(items => items.exists(_.itemType.contains(ItemType.Tweet)))) { + None + } else { + super.getUuaItem(ceItem, logEvent) + } + } + + /** + * This is fired when a user follows a profile from the + * profile page / people module and people tab on the Search Results Page / sidebar on the Home page + * A Profile can also be followed when a user clicks follow in the + * caret menu of a Tweet / follow button on hovering on profile avatar, + * which is captured in ClientTweetFollowAuthor + */ + object ProfileFollow extends BaseProfileClientEvent(ActionType.ClientProfileFollow) { + + /** + * ClientTweetFollowAuthor would emit 2 events, 1 with item type Tweet and 1 with item type User + * Both events will go to both actions (the actual classes). For ClientTweetFollowAuthor, + * item type of Tweet will filter out the event with item type User. But for ClientProfileFollow, + * because we need to include item type of User, then we will also include the event of TweetFollowAuthor + * if we don't do anything here. This override ensures we don't include tweet author follow events in ProfileFollow + */ + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + if (logEvent.eventDetails + .flatMap(_.items).exists(items => items.exists(_.itemType.contains(ItemType.Tweet)))) { + None + } else { + super.getUuaItem(ceItem, logEvent) + } + } + + /** + * This is fired when a user clicks Follow in the caret menu of a Tweet or hovers on the avatar of the tweet author + * and clicks on the Follow button. A profile can also be followed by clicking the Follow button on the Profile + * page and confirm, which is captured in ClientProfileFollow. + * The event emits two items, one of user type and another of tweet type, since the default implementation of + * BaseClientEvent only looks for Tweet type, the other item is dropped which is the expected behaviour + */ + object TweetFollowAuthor extends BaseClientEvent(ActionType.ClientTweetFollowAuthor) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = { + for { + actionTweetId <- ceItem.id + } yield { + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetFollowAuthor( + ClientTweetFollowAuthor( + ClientEventCommonUtils.getTweetAuthorFollowSource(logEvent.eventNamespace)) + )))) + } + } + } + + /** + * This is fired when a user clicks Unfollow in the caret menu of a Tweet or hovers on the avatar of the tweet author + * and clicks on the Unfollow button. A profile can also be unfollowed by clicking the Unfollow button on the Profile + * page and confirm, which will be captured in ClientProfileUnfollow. + * The event emits two items, one of user type and another of tweet type, since the default implementation of + * BaseClientEvent only looks for Tweet type, the other item is dropped which is the expected behaviour + */ + object TweetUnfollowAuthor extends BaseClientEvent(ActionType.ClientTweetUnfollowAuthor) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = { + for { + actionTweetId <- ceItem.id + } yield { + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetUnfollowAuthor( + ClientTweetUnfollowAuthor( + ClientEventCommonUtils.getTweetAuthorUnfollowSource(logEvent.eventNamespace)) + )))) + } + } + } + + /** + * This is fired when a user clicks Block in the caret menu of a Tweet to block the profile + * that authors this Tweet. A profile can also be blocked in the Profile page, which is captured + * in ClientProfileBlock + */ + object TweetBlockAuthor extends BaseClientEvent(ActionType.ClientTweetBlockAuthor) + + /** + * This is fired when a user clicks unblock in a pop-up prompt right after blocking an author + * in the drop-down menu of a tweet + */ + object TweetUnblockAuthor extends BaseClientEvent(ActionType.ClientTweetUnblockAuthor) + + /** + * This is fired when a user clicks Mute in the caret menu of a Tweet to mute the profile + * that authors this Tweet. A profile can also be muted in the Profile page, which is captured + * in ClientProfileMute + */ + object TweetMuteAuthor extends BaseClientEvent(ActionType.ClientTweetMuteAuthor) + + /** + * This is fired when a user clicks on a Tweet to open the Tweet details page. Note that for + * Tweets in the Notification Tab product surface, a click can be registered differently + * depending on whether the Tweet is a rendered Tweet (a click results in ClientTweetClick) + * or a wrapper Notification (a click results in ClientNotificationClick). + */ + object TweetClick extends BaseClientEvent(ActionType.ClientTweetClick) + + /** + * This is fired when a user clicks to view the profile page of another user from a Tweet + */ + object TweetClickProfile extends BaseClientEvent(ActionType.ClientTweetClickProfile) + + /** + * This is fired when a user clicks on the "share" icon on a Tweet to open the share menu. + * The user may or may not proceed and finish sharing the Tweet. + */ + object TweetClickShare extends BaseClientEvent(ActionType.ClientTweetClickShare) + + /** + * This is fired when a user clicks "Copy link to Tweet" in a menu appeared after hitting + * the "share" icon on a Tweet OR when a user selects share_via -> copy_link after long-click + * a link inside a tweet on a mobile device + */ + object TweetShareViaCopyLink extends BaseClientEvent(ActionType.ClientTweetShareViaCopyLink) + + /** + * This is fired when a user clicks "Send via Direct Message" after + * clicking on the "share" icon on a Tweet to open the share menu. + * The user may or may not proceed and finish Sending the DM. + */ + object TweetClickSendViaDirectMessage + extends BaseClientEvent(ActionType.ClientTweetClickSendViaDirectMessage) + + /** + * This is fired when a user clicks "Bookmark" after + * clicking on the "share" icon on a Tweet to open the share menu. + */ + object TweetShareViaBookmark extends BaseClientEvent(ActionType.ClientTweetShareViaBookmark) + + /** + * This is fired when a user clicks "Remove Tweet from Bookmarks" after + * clicking on the "share" icon on a Tweet to open the share menu. + */ + object TweetUnbookmark extends BaseClientEvent(ActionType.ClientTweetUnbookmark) + + /** + * This event is fired when the user clicks on a hashtag in a Tweet. + */ + object TweetClickHashtag extends BaseClientEvent(ActionType.ClientTweetClickHashtag) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = for { + actionTweetId <- ceItem.id + } yield Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = logEvent.eventDetails + .map( + _.targets.flatMap(_.headOption.flatMap(_.name)) + ) // fetch the first item in the details and then the name will have the hashtag value with the '#' sign + .map { hashtagOpt => + TweetActionInfo.ClientTweetClickHashtag( + ClientTweetClickHashtag(hashtag = hashtagOpt) + ) + })) + } + + /** + * This is fired when a user clicks "Bookmark" after clicking on the "share" icon on a Tweet to + * open the share menu, or when a user clicks on the 'bookmark' icon on a Tweet (bookmark icon + * is available to ios only as of March 2023). + * TweetBookmark and TweetShareByBookmark log the same events but serve for individual use cases. + */ + object TweetBookmark extends BaseClientEvent(ActionType.ClientTweetBookmark) + + /** + * This is fired when a user clicks on a link in a tweet. + * The link could be displayed as a URL or embedded + * in a component such as an image or a card in a tweet. + */ + object TweetOpenLink extends BaseClientEvent(ActionType.ClientTweetOpenLink) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + for { + actionTweetId <- ceItem.id + } yield Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetOpenLink( + ClientTweetOpenLink(url = logEvent.eventDetails.flatMap(_.url)) + )))) + } + + /** + * This is fired when a user takes a screenshot. + * This is available for only mobile clients. + */ + object TweetTakeScreenshot extends BaseClientEvent(ActionType.ClientTweetTakeScreenshot) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = + for { + actionTweetId <- ceItem.id + } yield Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo( + actionTweetId = actionTweetId, + ceItem = ceItem, + ceNamespaceOpt = logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetTakeScreenshot( + ClientTweetTakeScreenshot(percentVisibleHeight100k = ceItem.percentVisibleHeight100k) + )))) + } + + /** + * This is fired when a user clicks the "This Tweet isn't relevant" button in a prompt displayed + * after clicking "This Tweet's not helpful" in search result page or "Not Interested in this Tweet" + * in the home timeline page. + * Note: this button is hidden unless a user clicks "This Tweet isn't relevant" or + * "This Tweet's not helpful" first + */ + object TweetNotRelevant extends BaseClientEvent(ActionType.ClientTweetNotRelevant) + + /** + * This is fired when a user clicks "Undo" immediately after clicking "this Tweet isn't relevant", + * which is captured in TweetNotRelevant + */ + object TweetUndoNotRelevant extends BaseClientEvent(ActionType.ClientTweetUndoNotRelevant) + + /** + * This is fired when a user is logged out and follows a profile from the + * profile page / people module from web. + * One can only try to follow from web, iOS and Android do not support logged out browsing + */ + object ProfileFollowAttempt extends BaseProfileClientEvent(ActionType.ClientProfileFollowAttempt) + + /** + * This is fired when a user is logged out and favourite a tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + object TweetFavoriteAttempt extends BaseClientEvent(ActionType.ClientTweetFavoriteAttempt) + + /** + * This is fired when a user is logged out and Retweet a tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + object TweetRetweetAttempt extends BaseClientEvent(ActionType.ClientTweetRetweetAttempt) + + /** + * This is fired when a user is logged out and reply on tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + object TweetReplyAttempt extends BaseClientEvent(ActionType.ClientTweetReplyAttempt) + + /** + * This is fired when a user is logged out and clicks on login button. + * Currently seem to be generated only on [m5, LiteNativeWrapper] as of Jan 2023. + */ + object CTALoginClick extends BaseCTAClientEvent(ActionType.ClientCTALoginClick) + + /** + * This is fired when a user is logged out and login window is shown. + */ + object CTALoginStart extends BaseCTAClientEvent(ActionType.ClientCTALoginStart) + + /** + * This is fired when a user is logged out and login is successful. + */ + object CTALoginSuccess extends BaseCTAClientEvent(ActionType.ClientCTALoginSuccess) + + /** + * This is fired when a user is logged out and clicks on signup button. + */ + object CTASignupClick extends BaseCTAClientEvent(ActionType.ClientCTASignupClick) + + /** + * This is fired when a user is logged out and signup is successful. + */ + object CTASignupSuccess extends BaseCTAClientEvent(ActionType.ClientCTASignupSuccess) + + /** + * This is fired when a user opens a Push Notification. + * Refer to https://confluence.twitter.biz/pages/viewpage.action?pageId=161811800 + * for Push Notification scribe details + */ + object NotificationOpen extends BasePushNotificationClientEvent(ActionType.ClientNotificationOpen) + + /** + * This is fired when a user clicks on a notification in the Notification Tab. + * Refer to go/ntab-urt-scribe for Notification Tab scribe details. + */ + object NotificationClick + extends BaseNotificationTabClientEvent(ActionType.ClientNotificationClick) + + /** + * This is fired when a user taps the "See Less Often" caret menu item of a notification in + * the Notification Tab. + * Refer to go/ntab-urt-scribe for Notification Tab scribe details. + */ + object NotificationSeeLessOften + extends BaseNotificationTabClientEvent(ActionType.ClientNotificationSeeLessOften) + + /** + * This is fired when a user closes or swipes away a Push Notification. + * Refer to https://confluence.twitter.biz/pages/viewpage.action?pageId=161811800 + * for Push Notification scribe details + */ + object NotificationDismiss + extends BasePushNotificationClientEvent(ActionType.ClientNotificationDismiss) + + /** + * This is fired when a user clicks on a typeahead suggestion(queries, events, topics, users) + * in a drop-down menu of a search box or a tweet compose box. + */ + object TypeaheadClick extends BaseSearchTypeaheadEvent(ActionType.ClientTypeaheadClick) + + /** + * This is a generic event fired when the user submits feedback on a prompt. + * Some examples include Did You Find It Prompt and Tweet Relevance on Search Results Page. + */ + object FeedbackPromptSubmit + extends BaseFeedbackSubmitClientEvent(ActionType.ClientFeedbackPromptSubmit) + + object AppExit extends BaseUASClientEvent(ActionType.ClientAppExit) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventImpression.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventImpression.scala new file mode 100644 index 000000000..e0315015f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ClientEventImpression.scala @@ -0,0 +1,207 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.unified_user_actions.thriftscala.Item.TweetInfo + +object ClientEventImpression { + object TweetLingerImpression extends BaseClientEvent(ActionType.ClientTweetLingerImpression) { + override def getUuaItem( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[Item] = { + for { + actionTweetId <- ceItem.id + impressionDetails <- ceItem.impressionDetails + lingerStartTimestampMs <- impressionDetails.visibilityStart + lingerEndTimestampMs <- impressionDetails.visibilityEnd + } yield { + Item.TweetInfo( + ClientEventCommonUtils + .getBasicTweetInfo(actionTweetId, ceItem, logEvent.eventNamespace) + .copy(tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = lingerStartTimestampMs, + lingerEndTimestampMs = lingerEndTimestampMs + ) + )))) + } + } + } + + /** + * To make parity with iesource's definition, render impression for quoted Tweets would emit + * 2 events: 1 for the quoting Tweet and 1 for the original Tweet!!! + */ + object TweetRenderImpression extends BaseClientEvent(ActionType.ClientTweetRenderImpression) { + override def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = { + + val logBase: Option[LogBase] = logEvent.logBase + + val raw = for { + ed <- logEvent.eventDetails.toSeq + items <- ed.items.toSeq + ceItem <- items + eventTimestamp <- logBase.flatMap(getSourceTimestamp) + uuaItem <- getUuaItem(ceItem, logEvent) + if isItemTypeValid(ceItem.itemType) + } yield { + val userIdentifier: UserIdentifier = UserIdentifier( + userId = logBase.flatMap(_.userId), + guestIdMarketing = logBase.flatMap(_.guestIdMarketing)) + + val productSurface: Option[ProductSurface] = ProductSurfaceUtils + .getProductSurface(logEvent.eventNamespace) + + val eventMetaData: EventMetadata = ClientEventCommonUtils + .getEventMetadata( + eventTimestamp = eventTimestamp, + logEvent = logEvent, + ceItem = ceItem, + productSurface = productSurface + ) + + UnifiedUserAction( + userIdentifier = userIdentifier, + item = uuaItem, + actionType = ActionType.ClientTweetRenderImpression, + eventMetadata = eventMetaData, + productSurface = productSurface, + productSurfaceInfo = + ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent) + ) + } + + raw.flatMap { e => + e.item match { + case TweetInfo(t) => + // If it is an impression toward quoted Tweet we emit 2 impressions, 1 for quoting Tweet + // and 1 for the original Tweet. + if (t.quotedTweetId.isDefined) { + val originalItem = t.copy( + actionTweetId = t.quotedTweetId.get, + actionTweetAuthorInfo = t.quotedAuthorId.map(id => AuthorInfo(authorId = Some(id))), + quotingTweetId = Some(t.actionTweetId), + quotedTweetId = None, + inReplyToTweetId = None, + replyingTweetId = None, + retweetingTweetId = None, + retweetedTweetId = None, + quotedAuthorId = None, + retweetingAuthorId = None, + inReplyToAuthorId = None + ) + val original = e.copy(item = TweetInfo(originalItem)) + Seq(original, e) + } else Seq(e) + case _ => Nil + } + } + } + } + + object TweetGalleryImpression extends BaseClientEvent(ActionType.ClientTweetGalleryImpression) + + object TweetDetailsImpression extends BaseClientEvent(ActionType.ClientTweetDetailsImpression) { + + case class EventNamespaceInternal( + client: String, + page: String, + section: String, + component: String, + element: String, + action: String) + + def isTweetDetailsImpression(eventNamespaceOpt: Option[EventNamespace]): Boolean = + eventNamespaceOpt.exists { eventNamespace => + val eventNamespaceInternal = EventNamespaceInternal( + client = eventNamespace.client.getOrElse(""), + page = eventNamespace.page.getOrElse(""), + section = eventNamespace.section.getOrElse(""), + component = eventNamespace.component.getOrElse(""), + element = eventNamespace.element.getOrElse(""), + action = eventNamespace.action.getOrElse(""), + ) + + isIphoneAppOrMacAppOrIpadAppClientTweetDetailsImpression( + eventNamespaceInternal) || isAndroidAppClientTweetDetailsImpression( + eventNamespaceInternal) || isWebClientTweetDetailImpression( + eventNamespaceInternal) || isTweetDeckAppClientTweetDetailsImpression( + eventNamespaceInternal) || isOtherAppClientTweetDetailsImpression(eventNamespaceInternal) + } + + private def isWebClientTweetDetailImpression( + eventNamespace: EventNamespaceInternal + ): Boolean = { + val eventNameSpaceStr = + eventNamespace.client + ":" + eventNamespace.page + ":" + eventNamespace.section + ":" + eventNamespace.component + ":" + eventNamespace.element + ":" + eventNamespace.action + eventNameSpaceStr.equalsIgnoreCase("m5:tweet::::show") || eventNameSpaceStr.equalsIgnoreCase( + "m5:tweet:landing:::show") || eventNameSpaceStr + .equalsIgnoreCase("m2:tweet::::impression") || eventNameSpaceStr.equalsIgnoreCase( + "m2:tweet::tweet::impression") || eventNameSpaceStr + .equalsIgnoreCase("LiteNativeWrapper:tweet::::show") || eventNameSpaceStr.equalsIgnoreCase( + "LiteNativeWrapper:tweet:landing:::show") + } + + private def isOtherAppClientTweetDetailsImpression( + eventNamespace: EventNamespaceInternal + ): Boolean = { + val excludedClients = Set( + "web", + "m5", + "m2", + "LiteNativeWrapper", + "iphone", + "ipad", + "mac", + "android", + "android_tablet", + "deck") + (!excludedClients.contains(eventNamespace.client)) && eventNamespace.page + .equalsIgnoreCase("tweet") && eventNamespace.section + .equalsIgnoreCase("") && eventNamespace.component + .equalsIgnoreCase("tweet") && eventNamespace.element + .equalsIgnoreCase("") && eventNamespace.action.equalsIgnoreCase("impression") + } + + private def isTweetDeckAppClientTweetDetailsImpression( + eventNamespace: EventNamespaceInternal + ): Boolean = + eventNamespace.client + .equalsIgnoreCase("deck") && eventNamespace.page + .equalsIgnoreCase("tweet") && eventNamespace.section + .equalsIgnoreCase("") && eventNamespace.component + .equalsIgnoreCase("tweet") && eventNamespace.element + .equalsIgnoreCase("") && eventNamespace.action.equalsIgnoreCase("impression") + + private def isAndroidAppClientTweetDetailsImpression( + eventNamespace: EventNamespaceInternal + ): Boolean = + (eventNamespace.client + .equalsIgnoreCase("android") || eventNamespace.client + .equalsIgnoreCase("android_tablet")) && eventNamespace.page + .equalsIgnoreCase("tweet") && eventNamespace.section.equalsIgnoreCase( + "") && (eventNamespace.component + .equalsIgnoreCase("tweet") || eventNamespace.component + .matches("^suggest.*_tweet.*$") || eventNamespace.component + .equalsIgnoreCase("")) && eventNamespace.element + .equalsIgnoreCase("") && eventNamespace.action.equalsIgnoreCase("impression") + + private def isIphoneAppOrMacAppOrIpadAppClientTweetDetailsImpression( + eventNamespace: EventNamespaceInternal + ): Boolean = + (eventNamespace.client + .equalsIgnoreCase("iphone") || eventNamespace.client + .equalsIgnoreCase("ipad") || eventNamespace.client + .equalsIgnoreCase("mac")) && eventNamespace.page.equalsIgnoreCase( + "tweet") && eventNamespace.section + .equalsIgnoreCase("") && (eventNamespace.component + .equalsIgnoreCase("tweet") || eventNamespace.component + .matches("^suggest.*_tweet.*$")) && eventNamespace.element + .equalsIgnoreCase("") && eventNamespace.action.equalsIgnoreCase("impression") + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/HomeInfoUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/HomeInfoUtils.scala new file mode 100644 index 000000000..276908f02 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/HomeInfoUtils.scala @@ -0,0 +1,32 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerData +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerDataAliases.V1Alias +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.suggests.controller_data.v2.thriftscala.{ControllerData => ControllerDataV2} + +object HomeInfoUtils { + + def getHomeTweetControllerDataV1(ceItem: LogEventItem): Option[V1Alias] = { + ceItem.suggestionDetails + .flatMap(_.decodedControllerData) + .flatMap(_ match { + case ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1(homeTweetsControllerDataV1) + )) => + Some(homeTweetsControllerDataV1) + case _ => None + }) + } + + def getTraceId(ceItem: LogEventItem): Option[Long] = + getHomeTweetControllerDataV1(ceItem).flatMap(_.traceId) + + def getSuggestType(ceItem: LogEventItem): Option[String] = + ceItem.suggestionDetails.flatMap(_.suggestionType) + + def getRequestJoinId(ceItem: LogEventItem): Option[Long] = + getHomeTweetControllerDataV1(ceItem).flatMap(_.requestJoinId) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ItemTypeFilterPredicates.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ItemTypeFilterPredicates.scala new file mode 100644 index 000000000..6fb43b09c --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ItemTypeFilterPredicates.scala @@ -0,0 +1,40 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.ItemType + +object ItemTypeFilterPredicates { + private val TweetItemTypes = Set[ItemType](ItemType.Tweet, ItemType.QuotedTweet) + private val TopicItemTypes = Set[ItemType](ItemType.Tweet, ItemType.QuotedTweet, ItemType.Topic) + private val ProfileItemTypes = Set[ItemType](ItemType.User) + private val TypeaheadResultItemTypes = Set[ItemType](ItemType.Search, ItemType.User) + private val SearchResultsPageFeedbackSubmitItemTypes = + Set[ItemType](ItemType.Tweet, ItemType.RelevancePrompt) + + /** + * DDG lambda metrics count Tweets based on the `itemType` + * Reference code - https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/scala/com/twitter/experiments/lambda/shared/Timelines.scala?L156 + * Since enums `PROMOTED_TWEET` and `POPULAR_TWEET` are deprecated in the following thrift + * https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/clientapp/gen/client_app.thrift?L131 + * UUA filters two types of Tweets only: `TWEET` and `QUOTED_TWEET` + */ + def isItemTypeTweet(itemTypeOpt: Option[ItemType]): Boolean = + itemTypeOpt.exists(itemType => TweetItemTypes.contains(itemType)) + + def isItemTypeTopic(itemTypeOpt: Option[ItemType]): Boolean = + itemTypeOpt.exists(itemType => TopicItemTypes.contains(itemType)) + + def isItemTypeProfile(itemTypeOpt: Option[ItemType]): Boolean = + itemTypeOpt.exists(itemType => ProfileItemTypes.contains(itemType)) + + def isItemTypeTypeaheadResult(itemTypeOpt: Option[ItemType]): Boolean = + itemTypeOpt.exists(itemType => TypeaheadResultItemTypes.contains(itemType)) + + def isItemTypeForSearchResultsPageFeedbackSubmit(itemTypeOpt: Option[ItemType]): Boolean = + itemTypeOpt.exists(itemType => SearchResultsPageFeedbackSubmitItemTypes.contains(itemType)) + + /** + * Always return true. Use this when there is no need to filter based on `item_type` and all + * values of `item_type` are acceptable. + */ + def ignoreItemType(itemTypeOpt: Option[ItemType]): Boolean = true +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/NotificationClientEventUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/NotificationClientEventUtils.scala new file mode 100644 index 000000000..4a49a155f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/NotificationClientEventUtils.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} + +object NotificationClientEventUtils { + + // Notification id for notification in the Notification Tab + def getNotificationIdForNotificationTab( + ceItem: LogEventItem + ): Option[String] = { + for { + notificationTabDetails <- ceItem.notificationTabDetails + clientEventMetaData <- notificationTabDetails.clientEventMetadata + notificationId <- clientEventMetaData.upstreamId + } yield { + notificationId + } + } + + // Notification id for Push Notification + def getNotificationIdForPushNotification(logEvent: LogEvent): Option[String] = for { + pushNotificationDetails <- logEvent.notificationDetails + notificationId <- pushNotificationDetails.impressionId + } yield notificationId +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ProductSurfaceUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ProductSurfaceUtils.scala new file mode 100644 index 000000000..d0d0e5825 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/ProductSurfaceUtils.scala @@ -0,0 +1,109 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerDataAliases.V1Alias +import com.twitter.unified_user_actions.thriftscala._ + +object ProductSurfaceUtils { + + def getProductSurface(eventNamespace: Option[EventNamespace]): Option[ProductSurface] = { + ( + eventNamespace.flatMap(_.page), + eventNamespace.flatMap(_.section), + eventNamespace.flatMap(_.element)) match { + case (Some("home") | Some("home_latest"), _, _) => Some(ProductSurface.HomeTimeline) + case (Some("ntab"), _, _) => Some(ProductSurface.NotificationTab) + case (Some(page), Some(section), _) if isPushNotification(page, section) => + Some(ProductSurface.PushNotification) + case (Some("search"), _, _) => Some(ProductSurface.SearchResultsPage) + case (_, _, Some("typeahead")) => Some(ProductSurface.SearchTypeahead) + case _ => None + } + } + + private def isPushNotification(page: String, section: String): Boolean = { + Seq[String]("notification", "toasts").contains(page) || + (page == "app" && section == "push") + } + + def getProductSurfaceInfo( + productSurface: Option[ProductSurface], + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[ProductSurfaceInfo] = { + productSurface match { + case Some(ProductSurface.HomeTimeline) => createHomeTimelineInfo(ceItem) + case Some(ProductSurface.NotificationTab) => createNotificationTabInfo(ceItem) + case Some(ProductSurface.PushNotification) => createPushNotificationInfo(logEvent) + case Some(ProductSurface.SearchResultsPage) => createSearchResultPageInfo(ceItem, logEvent) + case Some(ProductSurface.SearchTypeahead) => createSearchTypeaheadInfo(ceItem, logEvent) + case _ => None + } + } + + private def createPushNotificationInfo(logEvent: LogEvent): Option[ProductSurfaceInfo] = + NotificationClientEventUtils.getNotificationIdForPushNotification(logEvent) match { + case Some(notificationId) => + Some( + ProductSurfaceInfo.PushNotificationInfo( + PushNotificationInfo(notificationId = notificationId))) + case _ => None + } + + private def createNotificationTabInfo(ceItem: LogEventItem): Option[ProductSurfaceInfo] = + NotificationClientEventUtils.getNotificationIdForNotificationTab(ceItem) match { + case Some(notificationId) => + Some( + ProductSurfaceInfo.NotificationTabInfo( + NotificationTabInfo(notificationId = notificationId))) + case _ => None + } + + private def createHomeTimelineInfo(ceItem: LogEventItem): Option[ProductSurfaceInfo] = { + def suggestType: Option[String] = HomeInfoUtils.getSuggestType(ceItem) + def controllerData: Option[V1Alias] = HomeInfoUtils.getHomeTweetControllerDataV1(ceItem) + + if (suggestType.isDefined || controllerData.isDefined) { + Some( + ProductSurfaceInfo.HomeTimelineInfo( + HomeTimelineInfo( + suggestionType = suggestType, + injectedPosition = controllerData.flatMap(_.injectedPosition) + ))) + } else None + } + + private def createSearchResultPageInfo( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[ProductSurfaceInfo] = { + val searchInfoUtil = new SearchInfoUtils(ceItem) + searchInfoUtil.getQueryOptFromItem(logEvent).map { query => + ProductSurfaceInfo.SearchResultsPageInfo( + SearchResultsPageInfo( + query = query, + querySource = searchInfoUtil.getQuerySourceOptFromControllerDataFromItem, + itemPosition = ceItem.position, + tweetResultSources = searchInfoUtil.getTweetResultSources, + userResultSources = searchInfoUtil.getUserResultSources, + queryFilterType = searchInfoUtil.getQueryFilterType(logEvent) + )) + } + } + + private def createSearchTypeaheadInfo( + ceItem: LogEventItem, + logEvent: LogEvent + ): Option[ProductSurfaceInfo] = { + logEvent.searchDetails.flatMap(_.query).map { query => + ProductSurfaceInfo.SearchTypeaheadInfo( + SearchTypeaheadInfo( + query = query, + itemPosition = ceItem.position + ) + ) + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/SearchInfoUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/SearchInfoUtils.scala new file mode 100644 index 000000000..4ebbbbeee --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/SearchInfoUtils.scala @@ -0,0 +1,129 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.search.common.constants.thriftscala.ThriftQuerySource +import com.twitter.search.common.constants.thriftscala.TweetResultSource +import com.twitter.search.common.constants.thriftscala.UserResultSource +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData.TweetTypesControllerData +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData.UserTypesControllerData +import com.twitter.suggests.controller_data.search_response.request.thriftscala.RequestControllerData +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerData.V1 +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerDataAliases.V1Alias +import com.twitter.suggests.controller_data.thriftscala.ControllerData.V2 +import com.twitter.suggests.controller_data.v2.thriftscala.ControllerData.SearchResponse +import com.twitter.unified_user_actions.thriftscala.SearchQueryFilterType +import com.twitter.unified_user_actions.thriftscala.SearchQueryFilterType._ + +class SearchInfoUtils(item: LogEventItem) { + private val searchControllerDataOpt: Option[V1Alias] = item.suggestionDetails.flatMap { sd => + sd.decodedControllerData.flatMap { decodedControllerData => + decodedControllerData match { + case V2(v2ControllerData) => + v2ControllerData match { + case SearchResponse(searchResponseControllerData) => + searchResponseControllerData match { + case V1(searchResponseControllerDataV1) => + Some(searchResponseControllerDataV1) + case _ => None + } + case _ => + None + } + case _ => None + } + } + } + + private val requestControllerDataOptFromItem: Option[RequestControllerData] = + searchControllerDataOpt.flatMap { searchControllerData => + searchControllerData.requestControllerData + } + private val itemTypesControllerDataOptFromItem: Option[ItemTypesControllerData] = + searchControllerDataOpt.flatMap { searchControllerData => + searchControllerData.itemTypesControllerData + } + + def checkBit(bitmap: Long, idx: Int): Boolean = { + (bitmap / Math.pow(2, idx)).toInt % 2 == 1 + } + + def getQueryOptFromSearchDetails(logEvent: LogEvent): Option[String] = { + logEvent.searchDetails.flatMap { sd => sd.query } + } + + def getQueryOptFromControllerDataFromItem: Option[String] = { + requestControllerDataOptFromItem.flatMap { rd => rd.rawQuery } + } + + def getQueryOptFromItem(logEvent: LogEvent): Option[String] = { + // First we try to get the query from controller data, and if that's not available, we fall + // back to query in search details. If both are None, queryOpt is None. + getQueryOptFromControllerDataFromItem.orElse(getQueryOptFromSearchDetails(logEvent)) + } + + def getTweetTypesOptFromControllerDataFromItem: Option[TweetTypesControllerData] = { + itemTypesControllerDataOptFromItem.flatMap { itemTypes => + itemTypes match { + case TweetTypesControllerData(tweetTypesControllerData) => + Some(TweetTypesControllerData(tweetTypesControllerData)) + case _ => None + } + } + } + + def getUserTypesOptFromControllerDataFromItem: Option[UserTypesControllerData] = { + itemTypesControllerDataOptFromItem.flatMap { itemTypes => + itemTypes match { + case UserTypesControllerData(userTypesControllerData) => + Some(UserTypesControllerData(userTypesControllerData)) + case _ => None + } + } + } + + def getQuerySourceOptFromControllerDataFromItem: Option[ThriftQuerySource] = { + requestControllerDataOptFromItem + .flatMap { rd => rd.querySource } + .flatMap { querySourceVal => ThriftQuerySource.get(querySourceVal) } + } + + def getTweetResultSources: Option[Set[TweetResultSource]] = { + getTweetTypesOptFromControllerDataFromItem + .flatMap { cd => cd.tweetTypesControllerData.tweetTypesBitmap } + .map { tweetTypesBitmap => + TweetResultSource.list.filter { t => checkBit(tweetTypesBitmap, t.value) }.toSet + } + } + + def getUserResultSources: Option[Set[UserResultSource]] = { + getUserTypesOptFromControllerDataFromItem + .flatMap { cd => cd.userTypesControllerData.userTypesBitmap } + .map { userTypesBitmap => + UserResultSource.list.filter { t => checkBit(userTypesBitmap, t.value) }.toSet + } + } + + def getQueryFilterType(logEvent: LogEvent): Option[SearchQueryFilterType] = { + val searchTab = logEvent.eventNamespace.map(_.client).flatMap { + case Some("m5") | Some("android") => logEvent.eventNamespace.flatMap(_.element) + case _ => logEvent.eventNamespace.flatMap(_.section) + } + searchTab.flatMap { + case "search_filter_top" => Some(Top) + case "search_filter_live" => Some(Latest) + // android uses search_filter_tweets instead of search_filter_live + case "search_filter_tweets" => Some(Latest) + case "search_filter_user" => Some(People) + case "search_filter_image" => Some(Photos) + case "search_filter_video" => Some(Videos) + case _ => None + } + } + + def getRequestJoinId: Option[Long] = requestControllerDataOptFromItem.flatMap(_.requestJoinId) + + def getTraceId: Option[Long] = requestControllerDataOptFromItem.flatMap(_.traceId) + +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/TopicIdUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/TopicIdUtils.scala new file mode 100644 index 000000000..16f8c9b35 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/TopicIdUtils.scala @@ -0,0 +1,157 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.Item +import com.twitter.clientapp.thriftscala.ItemType.Topic +import com.twitter.guide.scribing.thriftscala.TopicModuleMetadata +import com.twitter.guide.scribing.thriftscala.TransparentGuideDetails +import com.twitter.suggests.controller_data.home_hitl_topic_annotation_prompt.thriftscala.HomeHitlTopicAnnotationPromptControllerData +import com.twitter.suggests.controller_data.home_hitl_topic_annotation_prompt.v1.thriftscala.{ + HomeHitlTopicAnnotationPromptControllerData => HomeHitlTopicAnnotationPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_topic_annotation_prompt.thriftscala.HomeTopicAnnotationPromptControllerData +import com.twitter.suggests.controller_data.home_topic_annotation_prompt.v1.thriftscala.{ + HomeTopicAnnotationPromptControllerData => HomeTopicAnnotationPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_topic_follow_prompt.thriftscala.HomeTopicFollowPromptControllerData +import com.twitter.suggests.controller_data.home_topic_follow_prompt.v1.thriftscala.{ + HomeTopicFollowPromptControllerData => HomeTopicFollowPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerData +import com.twitter.suggests.controller_data.home_tweets.v1.thriftscala.{ + HomeTweetsControllerData => HomeTweetsControllerDataV1 +} +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerData +import com.twitter.suggests.controller_data.search_response.topic_follow_prompt.thriftscala.SearchTopicFollowPromptControllerData +import com.twitter.suggests.controller_data.search_response.tweet_types.thriftscala.TweetTypesControllerData +import com.twitter.suggests.controller_data.search_response.v1.thriftscala.{ + SearchResponseControllerData => SearchResponseControllerDataV1 +} +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.suggests.controller_data.timelines_topic.thriftscala.TimelinesTopicControllerData +import com.twitter.suggests.controller_data.timelines_topic.v1.thriftscala.{ + TimelinesTopicControllerData => TimelinesTopicControllerDataV1 +} +import com.twitter.suggests.controller_data.v2.thriftscala.{ControllerData => ControllerDataV2} +import com.twitter.util.Try + +object TopicIdUtils { + val DomainId: Long = 131 // Topical Domain + + def getTopicId( + item: Item, + namespace: EventNamespace + ): Option[Long] = + getTopicIdFromHomeSearch(item) + .orElse(getTopicFromGuide(item)) + .orElse(getTopicFromOnboarding(item, namespace)) + .orElse(getTopicIdFromItem(item)) + + def getTopicIdFromItem(item: Item): Option[Long] = + if (item.itemType.contains(Topic)) + item.id + else None + + def getTopicIdFromHomeSearch( + item: Item + ): Option[Long] = { + val decodedControllerData = item.suggestionDetails.flatMap(_.decodedControllerData) + decodedControllerData match { + case Some( + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1(homeTweets: HomeTweetsControllerDataV1))) + ) => + homeTweets.topicId + case Some( + ControllerData.V2( + ControllerDataV2.HomeTopicFollowPrompt( + HomeTopicFollowPromptControllerData.V1( + homeTopicFollowPrompt: HomeTopicFollowPromptControllerDataV1))) + ) => + homeTopicFollowPrompt.topicId + case Some( + ControllerData.V2( + ControllerDataV2.TimelinesTopic( + TimelinesTopicControllerData.V1( + timelinesTopic: TimelinesTopicControllerDataV1 + ))) + ) => + Some(timelinesTopic.topicId) + case Some( + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1(s: SearchResponseControllerDataV1))) + ) => + s.itemTypesControllerData match { + case Some( + ItemTypesControllerData.TopicFollowControllerData( + topicFollowControllerData: SearchTopicFollowPromptControllerData)) => + topicFollowControllerData.topicId + case Some( + ItemTypesControllerData.TweetTypesControllerData( + tweetTypesControllerData: TweetTypesControllerData)) => + tweetTypesControllerData.topicId + case _ => None + } + case Some( + ControllerData.V2( + ControllerDataV2.HomeTopicAnnotationPrompt( + HomeTopicAnnotationPromptControllerData.V1( + homeTopicAnnotationPrompt: HomeTopicAnnotationPromptControllerDataV1 + ))) + ) => + Some(homeTopicAnnotationPrompt.topicId) + case Some( + ControllerData.V2( + ControllerDataV2.HomeHitlTopicAnnotationPrompt( + HomeHitlTopicAnnotationPromptControllerData.V1( + homeHitlTopicAnnotationPrompt: HomeHitlTopicAnnotationPromptControllerDataV1 + ))) + ) => + Some(homeHitlTopicAnnotationPrompt.topicId) + + case _ => None + } + } + + def getTopicFromOnboarding( + item: Item, + namespace: EventNamespace + ): Option[Long] = + if (namespace.page.contains("onboarding") && + (namespace.section.exists(_.contains("topic")) || + namespace.component.exists(_.contains("topic")) || + namespace.element.exists(_.contains("topic")))) { + item.description.flatMap { description => + // description: "id=123,main=xyz,row=1" + val tokens = description.split(",").headOption.map(_.split("=")) + tokens match { + case Some(Array("id", token, _*)) => Try(token.toLong).toOption + case _ => None + } + } + } else None + + def getTopicFromGuide( + item: Item + ): Option[Long] = + item.guideItemDetails.flatMap { + _.transparentGuideDetails match { + case Some(TransparentGuideDetails.TopicMetadata(topicMetadata)) => + topicMetadata match { + case TopicModuleMetadata.TttInterest(_) => + None + case TopicModuleMetadata.SemanticCoreInterest(semanticCoreInterest) => + if (semanticCoreInterest.domainId == DomainId.toString) + Try(semanticCoreInterest.entityId.toLong).toOption + else None + case TopicModuleMetadata.SimClusterInterest(_) => + None + case TopicModuleMetadata.UnknownUnionField(_) => None + } + case _ => None + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/VideoClientEventUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/VideoClientEventUtils.scala new file mode 100644 index 000000000..842c501be --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event/VideoClientEventUtils.scala @@ -0,0 +1,42 @@ +package com.twitter.unified_user_actions.adapter.client_event + +import com.twitter.clientapp.thriftscala.AmplifyDetails +import com.twitter.clientapp.thriftscala.MediaDetails +import com.twitter.unified_user_actions.thriftscala.TweetVideoWatch +import com.twitter.unified_user_actions.thriftscala.TweetActionInfo +import com.twitter.video.analytics.thriftscala.MediaIdentifier + +object VideoClientEventUtils { + + /** + * For Tweets with multiple videos, find the id of the video that generated the client-event + */ + def videoIdFromMediaIdentifier(mediaIdentifier: MediaIdentifier): Option[String] = + mediaIdentifier match { + case MediaIdentifier.MediaPlatformIdentifier(mediaPlatformIdentifier) => + mediaPlatformIdentifier.mediaId.map(_.toString) + case _ => None + } + + /** + * Given: + * 1. the id of the video (`mediaId`) + * 2. details about all the media items in the Tweet (`mediaItems`), + * iterate over the `mediaItems` to lookup the metadata about the video with id `mediaId`. + */ + def getVideoMetadata( + mediaId: String, + mediaItems: Seq[MediaDetails], + amplifyDetails: Option[AmplifyDetails] + ): Option[TweetActionInfo] = { + mediaItems.collectFirst { + case media if media.contentId.contains(mediaId) => + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + mediaType = media.mediaType, + isMonetizable = media.dynamicAds, + videoType = amplifyDetails.flatMap(_.videoType) + )) + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/AdapterUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/AdapterUtils.scala new file mode 100644 index 000000000..3d5b85002 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/AdapterUtils.scala @@ -0,0 +1,15 @@ +package com.twitter.unified_user_actions.adapter.common + +import com.twitter.snowflake.id.SnowflakeId +import com.twitter.util.Time + +object AdapterUtils { + def currentTimestampMs: Long = Time.now.inMilliseconds + def getTimestampMsFromTweetId(tweetId: Long): Long = SnowflakeId.unixTimeMillisFromId(tweetId) + + // For now just make sure both language code and country code are in upper cases for consistency + // For language code, there are mixed lower and upper cases + // For country code, there are mixed lower and upper cases + def normalizeLanguageCode(inputLanguageCode: String): String = inputLanguageCode.toUpperCase + def normalizeCountryCode(inputCountryCode: String): String = inputCountryCode.toUpperCase +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/BUILD new file mode 100644 index 000000000..f5d2c526c --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common/BUILD @@ -0,0 +1,10 @@ +scala_library( + sources = [ + "*.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + "snowflake/src/main/scala/com/twitter/snowflake/id", + "util/util-core:util-core-util", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/BUILD new file mode 100644 index 000000000..612b89436 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/ibis:logging-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventAdapter.scala new file mode 100644 index 000000000..c994f5c81 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventAdapter.scala @@ -0,0 +1,55 @@ +package com.twitter.unified_user_actions.adapter.email_notification_event + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.ibis.thriftscala.NotificationScribe +import com.twitter.ibis.thriftscala.NotificationScribeType +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EmailNotificationInfo +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProductSurface +import com.twitter.unified_user_actions.thriftscala.ProductSurfaceInfo +import com.twitter.unified_user_actions.thriftscala.TweetInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +class EmailNotificationEventAdapter + extends AbstractAdapter[NotificationScribe, UnKeyed, UnifiedUserAction] { + import EmailNotificationEventAdapter._ + override def adaptOneToKeyedMany( + input: NotificationScribe, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object EmailNotificationEventAdapter { + + def adaptEvent(scribe: NotificationScribe): Seq[UnifiedUserAction] = { + Option(scribe).flatMap { e => + e.`type` match { + case NotificationScribeType.Click => + val tweetIdOpt = e.logBase.flatMap(EmailNotificationEventUtils.extractTweetId) + (tweetIdOpt, e.impressionId) match { + case (Some(tweetId), Some(impressionId)) => + Some( + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = e.userId), + item = Item.TweetInfo(TweetInfo(actionTweetId = tweetId)), + actionType = ActionType.ClientTweetEmailClick, + eventMetadata = EmailNotificationEventUtils.extractEventMetaData(e), + productSurface = Some(ProductSurface.EmailNotification), + productSurfaceInfo = Some( + ProductSurfaceInfo.EmailNotificationInfo( + EmailNotificationInfo(notificationId = impressionId))) + ) + ) + case _ => None + } + case _ => None + } + }.toSeq + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventUtils.scala new file mode 100644 index 000000000..85bd1999f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event/EmailNotificationEventUtils.scala @@ -0,0 +1,39 @@ +package com.twitter.unified_user_actions.adapter.email_notification_event + +import com.twitter.ibis.thriftscala.NotificationScribe +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.SourceLineage + +object EmailNotificationEventUtils { + + /* + * Extract TweetId from Logbase.page, here is a sample page below + * https://twitter.com/i/events/1580827044245544962?cn=ZmxleGlibGVfcmVjcw%3D%3D&refsrc=email + * */ + def extractTweetId(path: String): Option[Long] = { + val ptn = raw".*/([0-9]+)\\??.*".r + path match { + case ptn(tweetId) => + Some(tweetId.toLong) + case _ => + None + } + } + + def extractTweetId(logBase: LogBase): Option[Long] = logBase.page match { + case Some(path) => extractTweetId(path) + case None => None + } + + def extractEventMetaData(scribe: NotificationScribe): EventMetadata = + EventMetadata( + sourceTimestampMs = scribe.timestamp, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.EmailNotificationEvents, + language = scribe.logBase.flatMap(_.language), + countryCode = scribe.logBase.flatMap(_.country), + clientAppId = scribe.logBase.flatMap(_.clientAppId), + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/BUILD new file mode 100644 index 000000000..6baf312d6 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "fanoutservice/thrift/src/main/thrift:thrift-scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/FavoriteArchivalEventsAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/FavoriteArchivalEventsAdapter.scala new file mode 100644 index 000000000..1121dcfe5 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events/FavoriteArchivalEventsAdapter.scala @@ -0,0 +1,52 @@ +package com.twitter.unified_user_actions.adapter.favorite_archival_events + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.timelineservice.fanout.thriftscala.FavoriteArchivalEvent +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala._ + +class FavoriteArchivalEventsAdapter + extends AbstractAdapter[FavoriteArchivalEvent, UnKeyed, UnifiedUserAction] { + + import FavoriteArchivalEventsAdapter._ + override def adaptOneToKeyedMany( + input: FavoriteArchivalEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object FavoriteArchivalEventsAdapter { + + def adaptEvent(e: FavoriteArchivalEvent): Seq[UnifiedUserAction] = + Option(e).map { e => + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(e.favoriterId)), + item = getItem(e), + actionType = + if (e.isArchivingAction.getOrElse(true)) ActionType.ServerTweetArchiveFavorite + else ActionType.ServerTweetUnarchiveFavorite, + eventMetadata = getEventMetadata(e) + ) + }.toSeq + + def getItem(e: FavoriteArchivalEvent): Item = + Item.TweetInfo( + TweetInfo( + // Please note that here we always use TweetId (not sourceTweetId)!!! + actionTweetId = e.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = e.tweetUserId)), + retweetedTweetId = e.sourceTweetId + ) + ) + + def getEventMetadata(e: FavoriteArchivalEvent): EventMetadata = + EventMetadata( + sourceTimestampMs = e.timestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerFavoriteArchivalEvents, + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/BUILD new file mode 100644 index 000000000..6baf312d6 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "fanoutservice/thrift/src/main/thrift:thrift-scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/RetweetArchivalEventsAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/RetweetArchivalEventsAdapter.scala new file mode 100644 index 000000000..7efdd11d5 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events/RetweetArchivalEventsAdapter.scala @@ -0,0 +1,51 @@ +package com.twitter.unified_user_actions.adapter.retweet_archival_events + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.tweetypie.thriftscala.RetweetArchivalEvent +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala._ + +class RetweetArchivalEventsAdapter + extends AbstractAdapter[RetweetArchivalEvent, UnKeyed, UnifiedUserAction] { + + import RetweetArchivalEventsAdapter._ + override def adaptOneToKeyedMany( + input: RetweetArchivalEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object RetweetArchivalEventsAdapter { + + def adaptEvent(e: RetweetArchivalEvent): Seq[UnifiedUserAction] = + Option(e).map { e => + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(e.retweetUserId)), + item = getItem(e), + actionType = + if (e.isArchivingAction.getOrElse(true)) ActionType.ServerTweetArchiveRetweet + else ActionType.ServerTweetUnarchiveRetweet, + eventMetadata = getEventMetadata(e) + ) + }.toSeq + + def getItem(e: RetweetArchivalEvent): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = e.srcTweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(e.srcTweetUserId))), + retweetingTweetId = Some(e.retweetId) + ) + ) + + def getEventMetadata(e: RetweetArchivalEvent): EventMetadata = + EventMetadata( + sourceTimestampMs = e.timestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerRetweetArchivalEvents, + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BUILD new file mode 100644 index 000000000..c23748f7b --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/socialgraph:thrift-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseReportSocialGraphWriteEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseReportSocialGraphWriteEvent.scala new file mode 100644 index 000000000..c9626e7d8 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseReportSocialGraphWriteEvent.scala @@ -0,0 +1,24 @@ +package com.twitter.unified_user_actions.adapter.social_graph_event + +import com.twitter.socialgraph.thriftscala.Action +import com.twitter.socialgraph.thriftscala.SrcTargetRequest +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProfileActionInfo +import com.twitter.unified_user_actions.thriftscala.ProfileInfo +import com.twitter.unified_user_actions.thriftscala.ServerProfileReport + +abstract class BaseReportSocialGraphWriteEvent[T] extends BaseSocialGraphWriteEvent[T] { + def socialGraphAction: Action + + override def getSocialGraphItem(socialGraphSrcTargetRequest: SrcTargetRequest): Item = { + Item.ProfileInfo( + ProfileInfo( + actionProfileId = socialGraphSrcTargetRequest.target, + profileActionInfo = Some( + ProfileActionInfo.ServerProfileReport( + ServerProfileReport(reportType = socialGraphAction) + )) + ) + ) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseSocialGraphWriteEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseSocialGraphWriteEvent.scala new file mode 100644 index 000000000..91ca9581e --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/BaseSocialGraphWriteEvent.scala @@ -0,0 +1,60 @@ +package com.twitter.unified_user_actions.adapter.social_graph_event + +import com.twitter.socialgraph.thriftscala.LogEventContext +import com.twitter.socialgraph.thriftscala.SrcTargetRequest +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.socialgraph.thriftscala.WriteRequestResult +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProfileInfo +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +trait BaseSocialGraphWriteEvent[T] { + def uuaActionType: ActionType + + def getSrcTargetRequest( + e: WriteEvent + ): Seq[SrcTargetRequest] = getSubType(e) match { + case Some(subType: Seq[T]) => + getWriteRequestResultFromSubType(subType).collect { + case r if r.validationError.isEmpty => r.request + } + case _ => Nil + } + + def getSubType(e: WriteEvent): Option[Seq[T]] + def getWriteRequestResultFromSubType(subType: Seq[T]): Seq[WriteRequestResult] + + def toUnifiedUserAction( + writeEvent: WriteEvent, + uuaAction: BaseSocialGraphWriteEvent[_] + ): Seq[UnifiedUserAction] = + uuaAction.getSrcTargetRequest(writeEvent).map { srcTargetRequest => + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = writeEvent.context.loggedInUserId), + item = getSocialGraphItem(srcTargetRequest), + actionType = uuaAction.uuaActionType, + eventMetadata = getEventMetadata(writeEvent.context) + ) + } + + def getSocialGraphItem(socialGraphSrcTargetRequest: SrcTargetRequest): Item = { + Item.ProfileInfo( + ProfileInfo( + actionProfileId = socialGraphSrcTargetRequest.target + ) + ) + } + + def getEventMetadata(context: LogEventContext): EventMetadata = { + EventMetadata( + sourceTimestampMs = context.timestamp, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerSocialGraphEvents, + ) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphAdapter.scala new file mode 100644 index 000000000..a4eee6be3 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphAdapter.scala @@ -0,0 +1,48 @@ +package com.twitter.unified_user_actions.adapter.social_graph_event + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.socialgraph.thriftscala.Action._ +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.socialgraph.thriftscala.{Action => SocialGraphAction} +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.social_graph_event.SocialGraphEngagement._ +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +class SocialGraphAdapter extends AbstractAdapter[WriteEvent, UnKeyed, UnifiedUserAction] { + + import SocialGraphAdapter._ + + override def adaptOneToKeyedMany( + input: WriteEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object SocialGraphAdapter { + + def adaptEvent(writeEvent: WriteEvent): Seq[UnifiedUserAction] = + Option(writeEvent).flatMap { e => + socialGraphWriteEventTypeToUuaEngagementType.get(e.action) + } match { + case Some(uuaAction) => uuaAction.toUnifiedUserAction(writeEvent, uuaAction) + case None => Nil + } + + private val socialGraphWriteEventTypeToUuaEngagementType: Map[ + SocialGraphAction, + BaseSocialGraphWriteEvent[_] + ] = + Map[SocialGraphAction, BaseSocialGraphWriteEvent[_]]( + Follow -> ProfileFollow, + Unfollow -> ProfileUnfollow, + Block -> ProfileBlock, + Unblock -> ProfileUnblock, + Mute -> ProfileMute, + Unmute -> ProfileUnmute, + ReportAsSpam -> ProfileReportAsSpam, + ReportAsAbuse -> ProfileReportAsAbuse + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphEngagement.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphEngagement.scala new file mode 100644 index 000000000..952531c9f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event/SocialGraphEngagement.scala @@ -0,0 +1,157 @@ +package com.twitter.unified_user_actions.adapter.social_graph_event + +import com.twitter.socialgraph.thriftscala.Action +import com.twitter.socialgraph.thriftscala.BlockGraphEvent +import com.twitter.socialgraph.thriftscala.FollowGraphEvent +import com.twitter.socialgraph.thriftscala.MuteGraphEvent +import com.twitter.socialgraph.thriftscala.ReportAsAbuseGraphEvent +import com.twitter.socialgraph.thriftscala.ReportAsSpamGraphEvent +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.socialgraph.thriftscala.WriteRequestResult +import com.twitter.unified_user_actions.thriftscala.{ActionType => UuaActionType} + +object SocialGraphEngagement { + + /** + * This is "Follow" event to indicate user1 follows user2 captured in ServerProfileFollow + */ + object ProfileFollow extends BaseSocialGraphWriteEvent[FollowGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileFollow + + override def getSubType( + e: WriteEvent + ): Option[Seq[FollowGraphEvent]] = + e.follow + + override def getWriteRequestResultFromSubType( + e: Seq[FollowGraphEvent] + ): Seq[WriteRequestResult] = { + // Remove all redundant operations (FollowGraphEvent.redundantOperation == Some(true)) + e.collect { + case fe if !fe.redundantOperation.getOrElse(false) => fe.result + } + } + } + + /** + * This is "Unfollow" event to indicate user1 unfollows user2 captured in ServerProfileUnfollow + * + * Both Unfollow and Follow use the struct FollowGraphEvent, but are treated in its individual case + * class. + */ + object ProfileUnfollow extends BaseSocialGraphWriteEvent[FollowGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileUnfollow + + override def getSubType( + e: WriteEvent + ): Option[Seq[FollowGraphEvent]] = + e.follow + + override def getWriteRequestResultFromSubType( + e: Seq[FollowGraphEvent] + ): Seq[WriteRequestResult] = + e.collect { + case fe if !fe.redundantOperation.getOrElse(false) => fe.result + } + } + + /** + * This is "Block" event to indicate user1 blocks user2 captured in ServerProfileBlock + */ + object ProfileBlock extends BaseSocialGraphWriteEvent[BlockGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileBlock + + override def getSubType( + e: WriteEvent + ): Option[Seq[BlockGraphEvent]] = + e.block + + override def getWriteRequestResultFromSubType( + e: Seq[BlockGraphEvent] + ): Seq[WriteRequestResult] = + e.map(_.result) + } + + /** + * This is "Unblock" event to indicate user1 unblocks user2 captured in ServerProfileUnblock + * + * Both Unblock and Block use struct BlockGraphEvent, but are treated in its individual case + * class. + */ + object ProfileUnblock extends BaseSocialGraphWriteEvent[BlockGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileUnblock + + override def getSubType( + e: WriteEvent + ): Option[Seq[BlockGraphEvent]] = + e.block + + override def getWriteRequestResultFromSubType( + e: Seq[BlockGraphEvent] + ): Seq[WriteRequestResult] = + e.map(_.result) + } + + /** + * This is "Mute" event to indicate user1 mutes user2 captured in ServerProfileMute + */ + object ProfileMute extends BaseSocialGraphWriteEvent[MuteGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileMute + + override def getSubType( + e: WriteEvent + ): Option[Seq[MuteGraphEvent]] = + e.mute + + override def getWriteRequestResultFromSubType(e: Seq[MuteGraphEvent]): Seq[WriteRequestResult] = + e.map(_.result) + } + + /** + * This is "Unmute" event to indicate user1 unmutes user2 captured in ServerProfileUnmute + * + * Both Unmute and Mute use the struct MuteGraphEvent, but are treated in its individual case + * class. + */ + object ProfileUnmute extends BaseSocialGraphWriteEvent[MuteGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileUnmute + + override def getSubType( + e: WriteEvent + ): Option[Seq[MuteGraphEvent]] = + e.mute + + override def getWriteRequestResultFromSubType(e: Seq[MuteGraphEvent]): Seq[WriteRequestResult] = + e.map(_.result) + } + + object ProfileReportAsSpam extends BaseReportSocialGraphWriteEvent[ReportAsSpamGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileReport + override def socialGraphAction: Action = Action.ReportAsSpam + + override def getSubType( + e: WriteEvent + ): Option[Seq[ReportAsSpamGraphEvent]] = + e.reportAsSpam + + override def getWriteRequestResultFromSubType( + e: Seq[ReportAsSpamGraphEvent] + ): Seq[WriteRequestResult] = + e.map(_.result) + } + + object ProfileReportAsAbuse extends BaseReportSocialGraphWriteEvent[ReportAsAbuseGraphEvent] { + override def uuaActionType: UuaActionType = UuaActionType.ServerProfileReport + override def socialGraphAction: Action = Action.ReportAsAbuse + + override def getSubType( + e: WriteEvent + ): Option[Seq[ReportAsAbuseGraphEvent]] = + e.reportAsAbuse + + override def getWriteRequestResultFromSubType( + e: Seq[ReportAsAbuseGraphEvent] + ): Seq[WriteRequestResult] = + e.map(_.result) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/BUILD new file mode 100644 index 000000000..0281de0ef --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/timelineservice/server/internal:thrift-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/TlsFavsAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/TlsFavsAdapter.scala new file mode 100644 index 000000000..d76157949 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event/TlsFavsAdapter.scala @@ -0,0 +1,109 @@ +package com.twitter.unified_user_actions.adapter.tls_favs_event + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.timelineservice.thriftscala._ +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala._ + +class TlsFavsAdapter + extends AbstractAdapter[ContextualizedFavoriteEvent, UnKeyed, UnifiedUserAction] { + + import TlsFavsAdapter._ + + override def adaptOneToKeyedMany( + input: ContextualizedFavoriteEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object TlsFavsAdapter { + + def adaptEvent(e: ContextualizedFavoriteEvent): Seq[UnifiedUserAction] = + Option(e).flatMap { e => + e.event match { + case FavoriteEventUnion.Favorite(favoriteEvent) => + Some( + UnifiedUserAction( + userIdentifier = getUserIdentifier(Left(favoriteEvent)), + item = getFavItem(favoriteEvent), + actionType = ActionType.ServerTweetFav, + eventMetadata = getEventMetadata(Left(favoriteEvent), e.context), + productSurface = None, + productSurfaceInfo = None + )) + + case FavoriteEventUnion.Unfavorite(unfavoriteEvent) => + Some( + UnifiedUserAction( + userIdentifier = getUserIdentifier(Right(unfavoriteEvent)), + item = getUnfavItem(unfavoriteEvent), + actionType = ActionType.ServerTweetUnfav, + eventMetadata = getEventMetadata(Right(unfavoriteEvent), e.context), + productSurface = None, + productSurfaceInfo = None + )) + + case _ => None + } + }.toSeq + + def getFavItem(favoriteEvent: FavoriteEvent): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = favoriteEvent.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(favoriteEvent.tweetUserId))), + retweetingTweetId = favoriteEvent.retweetId + ) + ) + + def getUnfavItem(unfavoriteEvent: UnfavoriteEvent): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = unfavoriteEvent.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(unfavoriteEvent.tweetUserId))), + retweetingTweetId = unfavoriteEvent.retweetId + ) + ) + + def getEventMetadata( + event: Either[FavoriteEvent, UnfavoriteEvent], + context: LogEventContext + ): EventMetadata = { + val sourceTimestampMs = event match { + case Left(favoriteEvent) => favoriteEvent.eventTimeMs + case Right(unfavoriteEvent) => unfavoriteEvent.eventTimeMs + } + // Client UI language, see more at http://go/languagepriority. The format should be ISO 639-1. + val language = event match { + case Left(favoriteEvent) => favoriteEvent.viewerContext.flatMap(_.requestLanguageCode) + case Right(unfavoriteEvent) => unfavoriteEvent.viewerContext.flatMap(_.requestLanguageCode) + } + // From the request (user’s current location), + // see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/context/viewer.thrift?L54 + // The format should be ISO_3166-1_alpha-2. + val countryCode = event match { + case Left(favoriteEvent) => favoriteEvent.viewerContext.flatMap(_.requestCountryCode) + case Right(unfavoriteEvent) => unfavoriteEvent.viewerContext.flatMap(_.requestCountryCode) + } + EventMetadata( + sourceTimestampMs = sourceTimestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerTlsFavs, + language = language.map(AdapterUtils.normalizeLanguageCode), + countryCode = countryCode.map(AdapterUtils.normalizeCountryCode), + traceId = Some(context.traceId), + clientAppId = context.clientApplicationId, + ) + } + + // Get id of the user that took the action + def getUserIdentifier(event: Either[FavoriteEvent, UnfavoriteEvent]): UserIdentifier = + event match { + case Left(favoriteEvent) => UserIdentifier(userId = Some(favoriteEvent.userId)) + case Right(unfavoriteEvent) => UserIdentifier(userId = Some(unfavoriteEvent.userId)) + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BUILD new file mode 100644 index 000000000..9a526255f --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BUILD @@ -0,0 +1,16 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/gizmoduck:user-thrift-scala", + "src/thrift/com/twitter/tweetypie:events-scala", + "src/thrift/com/twitter/tweetypie:tweet-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEvent.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEvent.scala new file mode 100644 index 000000000..2e33d2970 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEvent.scala @@ -0,0 +1,51 @@ +package com.twitter.unified_user_actions.adapter.tweetypie_event + +import com.twitter.tweetypie.thriftscala.TweetEventFlags +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +/** + * Base class for Tweetypie Tweet Event. + * Extends this class if you need to implement the parser for a new Tweetypie Tweet Event Type. + * @see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/tweetypie/tweet_events.thrift?L225 + */ +trait BaseTweetypieTweetEvent[T] { + + /** + * Returns an Optional UnifiedUserAction from the event. + */ + def getUnifiedUserAction(event: T, flags: TweetEventFlags): Option[UnifiedUserAction] + + /** + * Returns UnifiedUserAction.ActionType for each type of event. + */ + protected def actionType: ActionType + + /** + * Output type of the predicate. Could be an input of getItem. + */ + type ExtractedEvent + + /** + * Returns Some(ExtractedEvent) if the event is valid and None otherwise. + */ + protected def extract(event: T): Option[ExtractedEvent] + + /** + * Get the UnifiedUserAction.Item from the event. + */ + protected def getItem(extractedEvent: ExtractedEvent, event: T): Item + + /** + * Get the UnifiedUserAction.UserIdentifier from the event. + */ + protected def getUserIdentifier(event: T): UserIdentifier + + /** + * Get UnifiedUserAction.EventMetadata from the event. + */ + protected def getEventMetadata(event: T, flags: TweetEventFlags): EventMetadata +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventCreate.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventCreate.scala new file mode 100644 index 000000000..5ede2f388 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventCreate.scala @@ -0,0 +1,200 @@ +package com.twitter.unified_user_actions.adapter.tweetypie_event + +import com.twitter.tweetypie.thriftscala.QuotedTweet +import com.twitter.tweetypie.thriftscala.Share +import com.twitter.tweetypie.thriftscala.TweetCreateEvent +import com.twitter.tweetypie.thriftscala.TweetEventFlags +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.TweetInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +/** + * Base class for Tweetypie TweetCreateEvent including Quote, Reply, Retweet, and Create. + */ +trait BaseTweetypieTweetEventCreate extends BaseTweetypieTweetEvent[TweetCreateEvent] { + type ExtractedEvent + protected def actionType: ActionType + + /** + * This is the country code where actionTweetId is sent from. For the definitions, + * check https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/tweetypie/tweet.thrift?L1001. + * + * UUA sets this to be consistent with IESource to meet existing use requirement. + * + * For ServerTweetReply/Retweet/Quote, the geo-tagging country code is not available in TweetCreatEvent. + * Thus, user signup country is picked to meet a customer use case. + * + * The definition here conflicts with the intention of UUA to log the request country code + * rather than the signup / geo-tagging country. + * + */ + protected def getCountryCode(tce: TweetCreateEvent): Option[String] = { + tce.tweet.place match { + case Some(p) => p.countryCode + case _ => tce.user.safety.flatMap(_.signupCountryCode) + } + } + + protected def getItem( + extractedEvent: ExtractedEvent, + tweetCreateEvent: TweetCreateEvent + ): Item + protected def extract(tweetCreateEvent: TweetCreateEvent): Option[ExtractedEvent] + + def getUnifiedUserAction( + tweetCreateEvent: TweetCreateEvent, + tweetEventFlags: TweetEventFlags + ): Option[UnifiedUserAction] = { + extract(tweetCreateEvent).map { extractedEvent => + UnifiedUserAction( + userIdentifier = getUserIdentifier(tweetCreateEvent), + item = getItem(extractedEvent, tweetCreateEvent), + actionType = actionType, + eventMetadata = getEventMetadata(tweetCreateEvent, tweetEventFlags), + productSurface = None, + productSurfaceInfo = None + ) + } + } + + protected def getUserIdentifier(tweetCreateEvent: TweetCreateEvent): UserIdentifier = + UserIdentifier(userId = Some(tweetCreateEvent.user.id)) + + protected def getEventMetadata( + tweetCreateEvent: TweetCreateEvent, + flags: TweetEventFlags + ): EventMetadata = + EventMetadata( + sourceTimestampMs = flags.timestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerTweetypieEvents, + traceId = None, // Currently traceId is not stored in TweetCreateEvent + // UUA sets this to None since there is no request level language info. + language = None, + countryCode = getCountryCode(tweetCreateEvent), + clientAppId = tweetCreateEvent.tweet.deviceSource.flatMap(_.clientAppId), + clientVersion = None // Currently clientVersion is not stored in TweetCreateEvent + ) +} + +/** + * Get UnifiedUserAction from a tweet Create. + * Note the Create is generated when the tweet is not a Quote/Retweet/Reply. + */ +object TweetypieCreateEvent extends BaseTweetypieTweetEventCreate { + type ExtractedEvent = Long + override protected val actionType: ActionType = ActionType.ServerTweetCreate + override protected def extract(tweetCreateEvent: TweetCreateEvent): Option[Long] = + Option(tweetCreateEvent.tweet.id) + + protected def getItem( + tweetId: Long, + tweetCreateEvent: TweetCreateEvent + ): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(tweetCreateEvent.user.id))) + )) +} + +/** + * Get UnifiedUserAction from a Reply. + * Note the Reply is generated when someone is replying to a tweet. + */ +object TweetypieReplyEvent extends BaseTweetypieTweetEventCreate { + case class PredicateOutput(tweetId: Long, userId: Long) + override type ExtractedEvent = PredicateOutput + override protected val actionType: ActionType = ActionType.ServerTweetReply + override protected def extract(tweetCreateEvent: TweetCreateEvent): Option[PredicateOutput] = + tweetCreateEvent.tweet.coreData + .flatMap(_.reply).flatMap(r => + r.inReplyToStatusId.map(tweetId => PredicateOutput(tweetId, r.inReplyToUserId))) + + override protected def getItem( + repliedTweet: PredicateOutput, + tweetCreateEvent: TweetCreateEvent + ): Item = { + Item.TweetInfo( + TweetInfo( + actionTweetId = repliedTweet.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(repliedTweet.userId))), + replyingTweetId = Some(tweetCreateEvent.tweet.id) + ) + ) + } +} + +/** + * Get UnifiedUserAction from a Quote. + * Note the Quote is generated when someone is quoting (retweeting with comment) a tweet. + */ +object TweetypieQuoteEvent extends BaseTweetypieTweetEventCreate { + override protected val actionType: ActionType = ActionType.ServerTweetQuote + type ExtractedEvent = QuotedTweet + override protected def extract(tweetCreateEvent: TweetCreateEvent): Option[QuotedTweet] = + tweetCreateEvent.tweet.quotedTweet + + override protected def getItem( + quotedTweet: QuotedTweet, + tweetCreateEvent: TweetCreateEvent + ): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = quotedTweet.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(quotedTweet.userId))), + quotingTweetId = Some(tweetCreateEvent.tweet.id) + ) + ) +} + +/** + * Get UnifiedUserAction from a Retweet. + * Note the Retweet is generated when someone is retweeting (without comment) a tweet. + */ +object TweetypieRetweetEvent extends BaseTweetypieTweetEventCreate { + override type ExtractedEvent = Share + override protected val actionType: ActionType = ActionType.ServerTweetRetweet + override protected def extract(tweetCreateEvent: TweetCreateEvent): Option[Share] = + tweetCreateEvent.tweet.coreData.flatMap(_.share) + + override protected def getItem(share: Share, tweetCreateEvent: TweetCreateEvent): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = share.sourceStatusId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(share.sourceUserId))), + retweetingTweetId = Some(tweetCreateEvent.tweet.id) + ) + ) +} + +/** + * Get UnifiedUserAction from a TweetEdit. + * Note the Edit is generated when someone is editing their quote or default tweet. The edit will + * generate a new Tweet. + */ +object TweetypieEditEvent extends BaseTweetypieTweetEventCreate { + override type ExtractedEvent = Long + override protected def actionType: ActionType = ActionType.ServerTweetEdit + override protected def extract(tweetCreateEvent: TweetCreateEvent): Option[Long] = + TweetypieEventUtils.editedTweetIdFromTweet(tweetCreateEvent.tweet) + + override protected def getItem( + editedTweetId: Long, + tweetCreateEvent: TweetCreateEvent + ): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = tweetCreateEvent.tweet.id, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(tweetCreateEvent.user.id))), + editedTweetId = Some(editedTweetId), + quotedTweetId = tweetCreateEvent.tweet.quotedTweet.map(_.tweetId) + ) + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventDelete.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventDelete.scala new file mode 100644 index 000000000..140c851ee --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/BaseTweetypieTweetEventDelete.scala @@ -0,0 +1,146 @@ +package com.twitter.unified_user_actions.adapter.tweetypie_event + +import com.twitter.tweetypie.thriftscala.QuotedTweet +import com.twitter.tweetypie.thriftscala.Share +import com.twitter.tweetypie.thriftscala.TweetDeleteEvent +import com.twitter.tweetypie.thriftscala.TweetEventFlags +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.TweetInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +trait BaseTweetypieTweetEventDelete extends BaseTweetypieTweetEvent[TweetDeleteEvent] { + type ExtractedEvent + protected def actionType: ActionType + + def getUnifiedUserAction( + tweetDeleteEvent: TweetDeleteEvent, + tweetEventFlags: TweetEventFlags + ): Option[UnifiedUserAction] = + extract(tweetDeleteEvent).map { extractedEvent => + UnifiedUserAction( + userIdentifier = getUserIdentifier(tweetDeleteEvent), + item = getItem(extractedEvent, tweetDeleteEvent), + actionType = actionType, + eventMetadata = getEventMetadata(tweetDeleteEvent, tweetEventFlags) + ) + } + + protected def extract(tweetDeleteEvent: TweetDeleteEvent): Option[ExtractedEvent] + + protected def getItem(extractedEvent: ExtractedEvent, tweetDeleteEvent: TweetDeleteEvent): Item + + protected def getUserIdentifier(tweetDeleteEvent: TweetDeleteEvent): UserIdentifier = + UserIdentifier(userId = tweetDeleteEvent.user.map(_.id)) + + protected def getEventMetadata( + tweetDeleteEvent: TweetDeleteEvent, + flags: TweetEventFlags + ): EventMetadata = + EventMetadata( + sourceTimestampMs = flags.timestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerTweetypieEvents, + traceId = None, // Currently traceId is not stored in TweetDeleteEvent. + // UUA sets this to None since there is no request level language info. + language = None, + // UUA sets this to be consistent with IESource. For the definition, + // see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/tweetypie/tweet.thrift?L1001. + // The definition here conflicts with the intention of UUA to log the request country code + // rather than the signup / geo-tagging country. + countryCode = tweetDeleteEvent.tweet.place.flatMap(_.countryCode), + /* clientApplicationId is user's app id if the delete is initiated by a user, + * or auditor's app id if the delete is initiated by an auditor */ + clientAppId = tweetDeleteEvent.audit.flatMap(_.clientApplicationId), + clientVersion = None // Currently clientVersion is not stored in TweetDeleteEvent. + ) +} + +object TweetypieDeleteEvent extends BaseTweetypieTweetEventDelete { + type ExtractedEvent = Long + override protected val actionType: ActionType = ActionType.ServerTweetDelete + + override protected def extract(tweetDeleteEvent: TweetDeleteEvent): Option[Long] = Some( + tweetDeleteEvent.tweet.id) + + protected def getItem( + tweetId: Long, + tweetDeleteEvent: TweetDeleteEvent + ): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = + Some(AuthorInfo(authorId = tweetDeleteEvent.tweet.coreData.map(_.userId))) + )) +} + +object TweetypieUnretweetEvent extends BaseTweetypieTweetEventDelete { + override protected val actionType: ActionType = ActionType.ServerTweetUnretweet + + override type ExtractedEvent = Share + + override protected def extract(tweetDeleteEvent: TweetDeleteEvent): Option[Share] = + tweetDeleteEvent.tweet.coreData.flatMap(_.share) + + override protected def getItem(share: Share, tweetDeleteEvent: TweetDeleteEvent): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = share.sourceStatusId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(share.sourceUserId))), + retweetingTweetId = Some(tweetDeleteEvent.tweet.id) + ) + ) +} + +object TweetypieUnreplyEvent extends BaseTweetypieTweetEventDelete { + case class PredicateOutput(tweetId: Long, userId: Long) + + override type ExtractedEvent = PredicateOutput + + override protected val actionType: ActionType = ActionType.ServerTweetUnreply + + override protected def extract(tweetDeleteEvent: TweetDeleteEvent): Option[PredicateOutput] = + tweetDeleteEvent.tweet.coreData + .flatMap(_.reply).flatMap(r => + r.inReplyToStatusId.map(tweetId => PredicateOutput(tweetId, r.inReplyToUserId))) + + override protected def getItem( + repliedTweet: PredicateOutput, + tweetDeleteEvent: TweetDeleteEvent + ): Item = { + Item.TweetInfo( + TweetInfo( + actionTweetId = repliedTweet.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(repliedTweet.userId))), + replyingTweetId = Some(tweetDeleteEvent.tweet.id) + ) + ) + } +} + +object TweetypieUnquoteEvent extends BaseTweetypieTweetEventDelete { + override protected val actionType: ActionType = ActionType.ServerTweetUnquote + + type ExtractedEvent = QuotedTweet + + override protected def extract(tweetDeleteEvent: TweetDeleteEvent): Option[QuotedTweet] = + tweetDeleteEvent.tweet.quotedTweet + + override protected def getItem( + quotedTweet: QuotedTweet, + tweetDeleteEvent: TweetDeleteEvent + ): Item = + Item.TweetInfo( + TweetInfo( + actionTweetId = quotedTweet.tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(quotedTweet.userId))), + quotingTweetId = Some(tweetDeleteEvent.tweet.id) + ) + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventAdapter.scala new file mode 100644 index 000000000..472a87ee2 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventAdapter.scala @@ -0,0 +1,78 @@ +package com.twitter.unified_user_actions.adapter.tweetypie_event + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.tweetypie.thriftscala.TweetEvent +import com.twitter.tweetypie.thriftscala.TweetEventData +import com.twitter.tweetypie.thriftscala.TweetCreateEvent +import com.twitter.tweetypie.thriftscala.TweetDeleteEvent +import com.twitter.tweetypie.thriftscala.TweetEventFlags +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +class TweetypieEventAdapter extends AbstractAdapter[TweetEvent, UnKeyed, UnifiedUserAction] { + import TweetypieEventAdapter._ + override def adaptOneToKeyedMany( + tweetEvent: TweetEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(tweetEvent).map(e => (UnKeyed, e)) +} + +object TweetypieEventAdapter { + def adaptEvent(tweetEvent: TweetEvent): Seq[UnifiedUserAction] = { + Option(tweetEvent).flatMap { e => + e.data match { + case TweetEventData.TweetCreateEvent(tweetCreateEvent: TweetCreateEvent) => + getUUAFromTweetCreateEvent(tweetCreateEvent, e.flags) + case TweetEventData.TweetDeleteEvent(tweetDeleteEvent: TweetDeleteEvent) => + getUUAFromTweetDeleteEvent(tweetDeleteEvent, e.flags) + case _ => None + } + }.toSeq + } + + def getUUAFromTweetCreateEvent( + tweetCreateEvent: TweetCreateEvent, + tweetEventFlags: TweetEventFlags + ): Option[UnifiedUserAction] = { + val tweetTypeOpt = TweetypieEventUtils.tweetTypeFromTweet(tweetCreateEvent.tweet) + + tweetTypeOpt.flatMap { tweetType => + tweetType match { + case TweetTypeReply => + TweetypieReplyEvent.getUnifiedUserAction(tweetCreateEvent, tweetEventFlags) + case TweetTypeRetweet => + TweetypieRetweetEvent.getUnifiedUserAction(tweetCreateEvent, tweetEventFlags) + case TweetTypeQuote => + TweetypieQuoteEvent.getUnifiedUserAction(tweetCreateEvent, tweetEventFlags) + case TweetTypeDefault => + TweetypieCreateEvent.getUnifiedUserAction(tweetCreateEvent, tweetEventFlags) + case TweetTypeEdit => + TweetypieEditEvent.getUnifiedUserAction(tweetCreateEvent, tweetEventFlags) + } + } + } + + def getUUAFromTweetDeleteEvent( + tweetDeleteEvent: TweetDeleteEvent, + tweetEventFlags: TweetEventFlags + ): Option[UnifiedUserAction] = { + val tweetTypeOpt = TweetypieEventUtils.tweetTypeFromTweet(tweetDeleteEvent.tweet) + + tweetTypeOpt.flatMap { tweetType => + tweetType match { + case TweetTypeRetweet => + TweetypieUnretweetEvent.getUnifiedUserAction(tweetDeleteEvent, tweetEventFlags) + case TweetTypeReply => + TweetypieUnreplyEvent.getUnifiedUserAction(tweetDeleteEvent, tweetEventFlags) + case TweetTypeQuote => + TweetypieUnquoteEvent.getUnifiedUserAction(tweetDeleteEvent, tweetEventFlags) + case TweetTypeDefault | TweetTypeEdit => + TweetypieDeleteEvent.getUnifiedUserAction(tweetDeleteEvent, tweetEventFlags) + } + } + } + +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventUtils.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventUtils.scala new file mode 100644 index 000000000..e3798f383 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event/TweetypieEventUtils.scala @@ -0,0 +1,54 @@ +package com.twitter.unified_user_actions.adapter.tweetypie_event + +import com.twitter.tweetypie.thriftscala.EditControl +import com.twitter.tweetypie.thriftscala.EditControlEdit +import com.twitter.tweetypie.thriftscala.Tweet + +sealed trait TweetypieTweetType +object TweetTypeDefault extends TweetypieTweetType +object TweetTypeReply extends TweetypieTweetType +object TweetTypeRetweet extends TweetypieTweetType +object TweetTypeQuote extends TweetypieTweetType +object TweetTypeEdit extends TweetypieTweetType + +object TweetypieEventUtils { + def editedTweetIdFromTweet(tweet: Tweet): Option[Long] = tweet.editControl.flatMap { + case EditControl.Edit(EditControlEdit(initialTweetId, _)) => Some(initialTweetId) + case _ => None + } + + def tweetTypeFromTweet(tweet: Tweet): Option[TweetypieTweetType] = { + val data = tweet.coreData + val inReplyingToStatusIdOpt = data.flatMap(_.reply).flatMap(_.inReplyToStatusId) + val shareOpt = data.flatMap(_.share) + val quotedTweetOpt = tweet.quotedTweet + val editedTweetIdOpt = editedTweetIdFromTweet(tweet) + + (inReplyingToStatusIdOpt, shareOpt, quotedTweetOpt, editedTweetIdOpt) match { + // Reply + case (Some(_), None, _, None) => + Some(TweetTypeReply) + // For any kind of retweet (be it retweet of quote tweet or retweet of a regular tweet) + // we only need to look at the `share` field + // https://confluence.twitter.biz/pages/viewpage.action?spaceKey=CSVC&title=TweetyPie+FAQ#TweetypieFAQ-HowdoItellifaTweetisaRetweet + case (None, Some(_), _, None) => + Some(TweetTypeRetweet) + // quote + case (None, None, Some(_), None) => + Some(TweetTypeQuote) + // create + case (None, None, None, None) => + Some(TweetTypeDefault) + // edit + case (None, None, _, Some(_)) => + Some(TweetTypeEdit) + // reply and retweet shouldn't be present at the same time + case (Some(_), Some(_), _, _) => + None + // reply and edit / retweet and edit shouldn't be present at the same time + case (Some(_), None, _, Some(_)) | (None, Some(_), _, Some(_)) => + None + } + } + +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/BUILD.bazel b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/BUILD.bazel new file mode 100644 index 000000000..24a0aab09 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/BUILD.bazel @@ -0,0 +1,18 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "src/thrift/com/twitter/gizmoduck:thrift-scala", + "src/thrift/com/twitter/gizmoduck:user-thrift-scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModificationAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModificationAdapter.scala new file mode 100644 index 000000000..24e111b96 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModificationAdapter.scala @@ -0,0 +1,41 @@ +package com.twitter.unified_user_actions.adapter.user_modification + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.gizmoduck.thriftscala.UserModification +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.user_modification_event.UserCreate +import com.twitter.unified_user_actions.adapter.user_modification_event.UserUpdate +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +class UserModificationAdapter + extends AbstractAdapter[UserModification, UnKeyed, UnifiedUserAction] { + + import UserModificationAdapter._ + + override def adaptOneToKeyedMany( + input: UserModification, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(UnKeyed, UnifiedUserAction)] = + adaptEvent(input).map { e => (UnKeyed, e) } +} + +object UserModificationAdapter { + + def adaptEvent(input: UserModification): Seq[UnifiedUserAction] = + Option(input).toSeq.flatMap { e => + if (e.create.isDefined) { // User create + Some(UserCreate.getUUA(input)) + } else if (e.update.isDefined) { // User updates + Some(UserUpdate.getUUA(input)) + } else if (e.destroy.isDefined) { + None + } else if (e.erase.isDefined) { + None + } else { + throw new IllegalArgumentException( + "None of the possible events is defined, there must be something with the source") + } + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModifications.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModifications.scala new file mode 100644 index 000000000..50b8a822d --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event/UserModifications.scala @@ -0,0 +1,97 @@ +package com.twitter.unified_user_actions.adapter.user_modification_event + +import com.twitter.gizmoduck.thriftscala.UserModification +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.ProfileActionInfo +import com.twitter.unified_user_actions.thriftscala.ServerUserUpdate +import com.twitter.unified_user_actions.thriftscala.ProfileInfo +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +abstract class BaseUserModificationEvent(actionType: ActionType) { + + def getUUA(input: UserModification): UnifiedUserAction = { + val userIdentifier: UserIdentifier = UserIdentifier(userId = input.userId) + + UnifiedUserAction( + userIdentifier = userIdentifier, + item = getItem(input), + actionType = actionType, + eventMetadata = getEventMetadata(input), + ) + } + + protected def getItem(input: UserModification): Item = + Item.ProfileInfo( + ProfileInfo( + actionProfileId = input.userId + .getOrElse(throw new IllegalArgumentException("target user_id is missing")) + ) + ) + + protected def getEventMetadata(input: UserModification): EventMetadata = + EventMetadata( + sourceTimestampMs = input.updatedAtMsec + .getOrElse(throw new IllegalArgumentException("timestamp is required")), + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerGizmoduckUserModificationEvents, + ) +} + +/** + * When there is a new user creation event in Gizmoduck + */ +object UserCreate extends BaseUserModificationEvent(ActionType.ServerUserCreate) { + override protected def getItem(input: UserModification): Item = + Item.ProfileInfo( + ProfileInfo( + actionProfileId = input.create + .map { user => + user.id + }.getOrElse(throw new IllegalArgumentException("target user_id is missing")), + name = input.create.flatMap { user => + user.profile.map(_.name) + }, + handle = input.create.flatMap { user => + user.profile.map(_.screenName) + }, + description = input.create.flatMap { user => + user.profile.map(_.description) + } + ) + ) + + override protected def getEventMetadata(input: UserModification): EventMetadata = + EventMetadata( + sourceTimestampMs = input.create + .map { user => + user.updatedAtMsec + }.getOrElse(throw new IllegalArgumentException("timestamp is required")), + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerGizmoduckUserModificationEvents, + ) +} + +object UserUpdate extends BaseUserModificationEvent(ActionType.ServerUserUpdate) { + override protected def getItem(input: UserModification): Item = + Item.ProfileInfo( + ProfileInfo( + actionProfileId = + input.userId.getOrElse(throw new IllegalArgumentException("userId is required")), + profileActionInfo = Some( + ProfileActionInfo.ServerUserUpdate( + ServerUserUpdate(updates = input.update.getOrElse(Nil), success = input.success))) + ) + ) + + override protected def getEventMetadata(input: UserModification): EventMetadata = + EventMetadata( + sourceTimestampMs = input.updatedAtMsec.getOrElse(AdapterUtils.currentTimestampMs), + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = SourceLineage.ServerGizmoduckUserModificationEvents, + ) +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/BUILD b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/BUILD new file mode 100644 index 000000000..fac4cd426 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/BUILD @@ -0,0 +1,14 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "iesource/thrift/src/main/thrift:thrift-scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/README b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/README new file mode 100644 index 000000000..b90dfd50d --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/README @@ -0,0 +1,11 @@ +Currently this dir contains multiple adapters. +The goal is similar: to generate Rekeyed (key by TweetId) `KeyedUuaTweet` events that can be +used for View Counts (aggregation). + +The 2 adapters: +1. Reads from UUA-all topic +2. Reads from InteractionEvents +We have 2 adapters mainly because currently InteractionEvents have 10% more TweetRenderImpressions +than what UUA has. Details can be found at https://docs.google.com/document/d/1UcEzAZ7rFrsU_6kl20R3YZ6u_Jt8PH_4-mVHWe216eM/edit# + +It is still unclear which source should be used, but at a time there should be only one service running. diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaAdapter.scala new file mode 100644 index 000000000..08cc46a21 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaAdapter.scala @@ -0,0 +1,33 @@ +package com.twitter.unified_user_actions.adapter.uua_aggregates + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.thriftscala._ + +/** + * The main purpose of the rekey adapter and the rekey service is to not break the existing + * customers with the existing Unkeyed and also making the value as a super light-weight schema. + * After we rekey from Unkeyed to Long (tweetId), downstream KafkaStreams can directly consume + * without repartitioning. + */ +class RekeyUuaAdapter extends AbstractAdapter[UnifiedUserAction, Long, KeyedUuaTweet] { + + import RekeyUuaAdapter._ + override def adaptOneToKeyedMany( + input: UnifiedUserAction, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(Long, KeyedUuaTweet)] = + adaptEvent(input).map { e => (e.tweetId, e) } +} + +object RekeyUuaAdapter { + def adaptEvent(e: UnifiedUserAction): Seq[KeyedUuaTweet] = + Option(e).flatMap { e => + e.actionType match { + case ActionType.ClientTweetRenderImpression => + ClientTweetRenderImpressionUua.getRekeyedUUA(e) + case _ => None + } + }.toSeq +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaFromInteractionEventsAdapter.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaFromInteractionEventsAdapter.scala new file mode 100644 index 000000000..a513d7298 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/RekeyUuaFromInteractionEventsAdapter.scala @@ -0,0 +1,86 @@ +package com.twitter.unified_user_actions.adapter.uua_aggregates + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.iesource.thriftscala.ClientEventContext +import com.twitter.iesource.thriftscala.EngagingContext +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.iesource.thriftscala.InteractionType +import com.twitter.iesource.thriftscala.InteractionEvent +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.KeyedUuaTweet +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +/** + * This is to read directly from InteractionEvents + */ +class RekeyUuaFromInteractionEventsAdapter + extends AbstractAdapter[InteractionEvent, Long, KeyedUuaTweet] { + + import RekeyUuaFromInteractionEventsAdapter._ + override def adaptOneToKeyedMany( + input: InteractionEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[(Long, KeyedUuaTweet)] = + adaptEvent(input, statsReceiver).map { e => (e.tweetId, e) } +} + +object RekeyUuaFromInteractionEventsAdapter { + + def adaptEvent( + e: InteractionEvent, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Seq[KeyedUuaTweet] = + Option(e).flatMap { e => + e.interactionType.flatMap { + case InteractionType.TweetRenderImpression if !isDetailImpression(e.engagingContext) => + getRekeyedUUA( + input = e, + actionType = ActionType.ClientTweetRenderImpression, + sourceLineage = SourceLineage.ClientEvents, + statsReceiver = statsReceiver) + case _ => None + } + }.toSeq + + def getRekeyedUUA( + input: InteractionEvent, + actionType: ActionType, + sourceLineage: SourceLineage, + statsReceiver: StatsReceiver = NullStatsReceiver + ): Option[KeyedUuaTweet] = + input.engagingUserId match { + // please see https://docs.google.com/document/d/1-fy2S-8-YMRQgEN0Sco0OLTmeOIUdqgiZ5G1KwTHt2g/edit# + // in order to withstand of potential attacks, we filter out the logged-out users. + // Checking user id is 0 is the reverse engineering of + // https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/iesource/thrift/src/main/thrift/com/twitter/iesource/interaction_event.thrift?L220 + // https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/iesource/common/src/main/scala/com/twitter/iesource/common/converters/client/LogEventConverter.scala?L198 + case 0L => + statsReceiver.counter("loggedOutEvents").incr() + None + case _ => + Some( + KeyedUuaTweet( + tweetId = input.targetId, + actionType = actionType, + userIdentifier = UserIdentifier(userId = Some(input.engagingUserId)), + eventMetadata = EventMetadata( + sourceTimestampMs = input.triggeredTimestampMillis.getOrElse(input.timestampMillis), + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = sourceLineage + ) + )) + } + + def isDetailImpression(engagingContext: EngagingContext): Boolean = + engagingContext match { + case EngagingContext.ClientEventContext( + ClientEventContext(_, _, _, _, _, _, _, Some(isDetailsImpression), _) + ) if isDetailsImpression => + true + case _ => false + } +} diff --git a/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/UuaActions.scala b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/UuaActions.scala new file mode 100644 index 000000000..eaf307ec8 --- /dev/null +++ b/unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates/UuaActions.scala @@ -0,0 +1,36 @@ +package com.twitter.unified_user_actions.adapter.uua_aggregates + +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.KeyedUuaTweet +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +abstract class BaseUuaAction(actionType: ActionType) { + def getRekeyedUUA(input: UnifiedUserAction): Option[KeyedUuaTweet] = + getTweetIdFromItem(input.item).map { tweetId => + KeyedUuaTweet( + tweetId = tweetId, + actionType = input.actionType, + userIdentifier = input.userIdentifier, + eventMetadata = EventMetadata( + sourceTimestampMs = input.eventMetadata.sourceTimestampMs, + receivedTimestampMs = AdapterUtils.currentTimestampMs, + sourceLineage = input.eventMetadata.sourceLineage + ) + ) + } + + protected def getTweetIdFromItem(item: Item): Option[Long] = { + item match { + case Item.TweetInfo(tweetInfo) => Some(tweetInfo.actionTweetId) + case _ => None + } + } +} + +/** + * When there is a new user creation event in Gizmoduck + */ +object ClientTweetRenderImpressionUua extends BaseUuaAction(ActionType.ClientTweetRenderImpression) diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdapterUtilsSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdapterUtilsSpec.scala new file mode 100644 index 000000000..c28ab3653 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdapterUtilsSpec.scala @@ -0,0 +1,29 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.common.AdapterUtils +import com.twitter.util.Time + +class AdapterUtilsSpec extends Test { + trait Fixture { + + val frozenTime: Time = Time.fromMilliseconds(1658949273000L) + val languageCode = "en" + val countryCode = "us" + } + + test("tests") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = Time.fromMilliseconds(AdapterUtils.currentTimestampMs) + assert(frozenTime === actual) + } + + val actionedTweetId = 1554576940756246272L + assert(AdapterUtils.getTimestampMsFromTweetId(actionedTweetId) === 1659474999976L) + + assert(languageCode.toUpperCase === AdapterUtils.normalizeLanguageCode(languageCode)) + assert(countryCode.toUpperCase === AdapterUtils.normalizeCountryCode(countryCode)) + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdsCallbackEngagementsAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdsCallbackEngagementsAdapterSpec.scala new file mode 100644 index 000000000..48309085c --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/AdsCallbackEngagementsAdapterSpec.scala @@ -0,0 +1,282 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.adserver.thriftscala.EngagementType +import com.twitter.clientapp.thriftscala.AmplifyDetails +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.TestFixtures.AdsCallbackEngagementsFixture +import com.twitter.unified_user_actions.adapter.ads_callback_engagements.AdsCallbackEngagementsAdapter +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.TweetActionInfo +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks + +class AdsCallbackEngagementsAdapterSpec extends Test with TableDrivenPropertyChecks { + + test("Test basic conversion for ads callback engagement type fav") { + + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val events = Table( + ("inputEvent", "expectedUuaOutput"), + ( // Test with authorId + createSpendServerEvent(EngagementType.Fav), + Seq( + createExpectedUua( + ActionType.ServerPromotedTweetFav, + createTweetInfoItem(authorInfo = Some(authorInfo))))) + ) + forEvery(events) { (event: SpendServerEvent, expected: Seq[UnifiedUserAction]) => + val actual = AdsCallbackEngagementsAdapter.adaptEvent(event) + assert(expected === actual) + } + } + } + } + + test("Test basic conversion for different engagement types") { + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val mappings = Table( + ("engagementType", "actionType"), + (EngagementType.Unfav, ActionType.ServerPromotedTweetUnfav), + (EngagementType.Reply, ActionType.ServerPromotedTweetReply), + (EngagementType.Retweet, ActionType.ServerPromotedTweetRetweet), + (EngagementType.Block, ActionType.ServerPromotedTweetBlockAuthor), + (EngagementType.Unblock, ActionType.ServerPromotedTweetUnblockAuthor), + (EngagementType.Send, ActionType.ServerPromotedTweetComposeTweet), + (EngagementType.Detail, ActionType.ServerPromotedTweetClick), + (EngagementType.Report, ActionType.ServerPromotedTweetReport), + (EngagementType.Mute, ActionType.ServerPromotedTweetMuteAuthor), + (EngagementType.ProfilePic, ActionType.ServerPromotedTweetClickProfile), + (EngagementType.ScreenName, ActionType.ServerPromotedTweetClickProfile), + (EngagementType.UserName, ActionType.ServerPromotedTweetClickProfile), + (EngagementType.Hashtag, ActionType.ServerPromotedTweetClickHashtag), + (EngagementType.CarouselSwipeNext, ActionType.ServerPromotedTweetCarouselSwipeNext), + ( + EngagementType.CarouselSwipePrevious, + ActionType.ServerPromotedTweetCarouselSwipePrevious), + (EngagementType.DwellShort, ActionType.ServerPromotedTweetLingerImpressionShort), + (EngagementType.DwellMedium, ActionType.ServerPromotedTweetLingerImpressionMedium), + (EngagementType.DwellLong, ActionType.ServerPromotedTweetLingerImpressionLong), + (EngagementType.DismissSpam, ActionType.ServerPromotedTweetDismissSpam), + (EngagementType.DismissWithoutReason, ActionType.ServerPromotedTweetDismissWithoutReason), + (EngagementType.DismissUninteresting, ActionType.ServerPromotedTweetDismissUninteresting), + (EngagementType.DismissRepetitive, ActionType.ServerPromotedTweetDismissRepetitive), + ) + + forEvery(mappings) { (engagementType: EngagementType, actionType: ActionType) => + val event = createSpendServerEvent(engagementType) + val actual = AdsCallbackEngagementsAdapter.adaptEvent(event) + val expected = + Seq(createExpectedUua(actionType, createTweetInfoItem(authorInfo = Some(authorInfo)))) + assert(expected === actual) + } + } + } + } + + test("Test conversion for ads callback engagement type spotlight view and click") { + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("adsEngagement", "uuaAction"), + (EngagementType.SpotlightClick, ActionType.ServerPromotedTweetClickSpotlight), + (EngagementType.SpotlightView, ActionType.ServerPromotedTweetViewSpotlight), + (EngagementType.TrendView, ActionType.ServerPromotedTrendView), + (EngagementType.TrendClick, ActionType.ServerPromotedTrendClick), + ) + forEvery(input) { (engagementType: EngagementType, actionType: ActionType) => + val adsEvent = createSpendServerEvent(engagementType) + val expected = Seq(createExpectedUua(actionType, trendInfoItem)) + val actual = AdsCallbackEngagementsAdapter.adaptEvent(adsEvent) + assert(expected === actual) + } + } + } + } + + test("Test basic conversion for ads callback engagement open link with or without url") { + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("url", "tweetActionInfo"), + (Some("go/url"), openLinkWithUrl), + (None, openLinkWithoutUrl) + ) + + forEvery(input) { (url: Option[String], tweetActionInfo: TweetActionInfo) => + val event = createSpendServerEvent(engagementType = EngagementType.Url, url = url) + val actual = AdsCallbackEngagementsAdapter.adaptEvent(event) + val expected = Seq(createExpectedUua( + ActionType.ServerPromotedTweetOpenLink, + createTweetInfoItem(authorInfo = Some(authorInfo), actionInfo = Some(tweetActionInfo)))) + assert(expected === actual) + } + } + } + } + + test("Test basic conversion for different engagement types with profile info") { + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val mappings = Table( + ("engagementType", "actionType"), + (EngagementType.Follow, ActionType.ServerPromotedProfileFollow), + (EngagementType.Unfollow, ActionType.ServerPromotedProfileUnfollow) + ) + forEvery(mappings) { (engagementType: EngagementType, actionType: ActionType) => + val event = createSpendServerEvent(engagementType) + val actual = AdsCallbackEngagementsAdapter.adaptEvent(event) + val expected = Seq(createExpectedUuaWithProfileInfo(actionType)) + assert(expected === actual) + } + } + } + } + + test("Test basic conversion for ads callback engagement type video_content_*") { + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val events = Table( + ("engagementType", "amplifyDetails", "actionType", "tweetActionInfo"), + //For video_content_* events on promoted tweets when there is no preroll ad played + ( + EngagementType.VideoContentPlayback25, + amplifyDetailsPromotedTweetWithoutAd, + ActionType.ServerPromotedTweetVideoPlayback25, + tweetActionInfoPromotedTweetWithoutAd), + ( + EngagementType.VideoContentPlayback50, + amplifyDetailsPromotedTweetWithoutAd, + ActionType.ServerPromotedTweetVideoPlayback50, + tweetActionInfoPromotedTweetWithoutAd), + ( + EngagementType.VideoContentPlayback75, + amplifyDetailsPromotedTweetWithoutAd, + ActionType.ServerPromotedTweetVideoPlayback75, + tweetActionInfoPromotedTweetWithoutAd), + //For video_content_* events on promoted tweets when there is a preroll ad + ( + EngagementType.VideoContentPlayback25, + amplifyDetailsPromotedTweetWithAd, + ActionType.ServerPromotedTweetVideoPlayback25, + tweetActionInfoPromotedTweetWithAd), + ( + EngagementType.VideoContentPlayback50, + amplifyDetailsPromotedTweetWithAd, + ActionType.ServerPromotedTweetVideoPlayback50, + tweetActionInfoPromotedTweetWithAd), + ( + EngagementType.VideoContentPlayback75, + amplifyDetailsPromotedTweetWithAd, + ActionType.ServerPromotedTweetVideoPlayback75, + tweetActionInfoPromotedTweetWithAd), + ) + forEvery(events) { + ( + engagementType: EngagementType, + amplifyDetails: Option[AmplifyDetails], + actionType: ActionType, + actionInfo: Option[TweetActionInfo] + ) => + val spendEvent = + createVideoSpendServerEvent(engagementType, amplifyDetails, promotedTweetId, None) + val expected = Seq(createExpectedVideoUua(actionType, actionInfo, promotedTweetId)) + + val actual = AdsCallbackEngagementsAdapter.adaptEvent(spendEvent) + assert(expected === actual) + } + } + } + } + + test("Test basic conversion for ads callback engagement type video_ad_*") { + + new AdsCallbackEngagementsFixture { + Time.withTimeAt(frozenTime) { _ => + val events = Table( + ( + "engagementType", + "amplifyDetails", + "actionType", + "tweetActionInfo", + "promotedTweetId", + "organicTweetId"), + //For video_ad_* events when the preroll ad is on a promoted tweet. + ( + EngagementType.VideoAdPlayback25, + amplifyDetailsPrerollAd, + ActionType.ServerPromotedTweetVideoAdPlayback25, + tweetActionInfoPrerollAd, + promotedTweetId, + None + ), + ( + EngagementType.VideoAdPlayback50, + amplifyDetailsPrerollAd, + ActionType.ServerPromotedTweetVideoAdPlayback50, + tweetActionInfoPrerollAd, + promotedTweetId, + None + ), + ( + EngagementType.VideoAdPlayback75, + amplifyDetailsPrerollAd, + ActionType.ServerPromotedTweetVideoAdPlayback75, + tweetActionInfoPrerollAd, + promotedTweetId, + None + ), + // For video_ad_* events when the preroll ad is on an organic tweet. + ( + EngagementType.VideoAdPlayback25, + amplifyDetailsPrerollAd, + ActionType.ServerTweetVideoAdPlayback25, + tweetActionInfoPrerollAd, + None, + organicTweetId + ), + ( + EngagementType.VideoAdPlayback50, + amplifyDetailsPrerollAd, + ActionType.ServerTweetVideoAdPlayback50, + tweetActionInfoPrerollAd, + None, + organicTweetId + ), + ( + EngagementType.VideoAdPlayback75, + amplifyDetailsPrerollAd, + ActionType.ServerTweetVideoAdPlayback75, + tweetActionInfoPrerollAd, + None, + organicTweetId + ), + ) + forEvery(events) { + ( + engagementType: EngagementType, + amplifyDetails: Option[AmplifyDetails], + actionType: ActionType, + actionInfo: Option[TweetActionInfo], + promotedTweetId: Option[Long], + organicTweetId: Option[Long], + ) => + val spendEvent = + createVideoSpendServerEvent( + engagementType, + amplifyDetails, + promotedTweetId, + organicTweetId) + val actionTweetId = if (organicTweetId.isDefined) organicTweetId else promotedTweetId + val expected = Seq(createExpectedVideoUua(actionType, actionInfo, actionTweetId)) + + val actual = AdsCallbackEngagementsAdapter.adaptEvent(spendEvent) + assert(expected === actual) + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/BUILD.bazel b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/BUILD.bazel new file mode 100644 index 000000000..4c6d8e27a --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/BUILD.bazel @@ -0,0 +1,23 @@ +junit_tests( + sources = ["**/*.scala"], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/junit", + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala:test-deps", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates", + "util/util-mock/src/main/scala/com/twitter/util/mock", + ], +) diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/ClientEventAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/ClientEventAdapterSpec.scala new file mode 100644 index 000000000..dde8a2f02 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/ClientEventAdapterSpec.scala @@ -0,0 +1,2157 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.NotificationTabDetails +import com.twitter.clientapp.thriftscala.ReportDetails +import com.twitter.clientapp.thriftscala.SearchDetails +import com.twitter.clientapp.thriftscala.SuggestionDetails +import com.twitter.inject.Test +import com.twitter.logbase.thriftscala.ClientEventReceiver +import com.twitter.reportflow.thriftscala.ReportType +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.unified_user_actions.adapter.client_event.ClientEventAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks +import org.scalatest.prop.TableFor1 +import org.scalatest.prop.TableFor2 +import scala.language.implicitConversions + +class ClientEventAdapterSpec extends Test with TableDrivenPropertyChecks { + // Tests for invalid client-events + test("should ignore events") { + new TestFixtures.ClientEventFixture { + val eventsToBeIgnored: TableFor2[String, LogEvent] = Table( + ("namespace", "event"), + ("ddg", ddgEvent), + ("qig_ranker", qigRankerEvent), + ("timelnemixer", timelineMixerEvent), + ("timelineservice", timelineServiceEvent), + ("tweetconvosvc", tweetConcServiceEvent), + ("item-type is non-tweet", renderNonTweetItemTypeEvent) + ) + + forEvery(eventsToBeIgnored) { (_: String, event: LogEvent) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(actual.isEmpty) + } + } + } + + test("Tests for ItemType filter") { + /// Tweet events + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val events = Table( + ("itemType", "expectedUUA"), + (Some(ItemType.Tweet), Seq(expectedTweetRenderDefaultTweetUUA)), + (Some(ItemType.QuotedTweet), Seq(expectedTweetRenderDefaultTweetUUA)), + (Some(ItemType.Topic), Nil), + (None, Nil) + ) + + forEvery(events) { (itemTypeOpt: Option[ItemType], expected: Seq[UnifiedUserAction]) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceRenderEventNamespace), + itemTypeOpt = itemTypeOpt + )) + assert(expected === actual) + } + } + } + + /// Topic events + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val expected: UnifiedUserAction = mkExpectedUUAForActionTowardTopicEvent( + topicId = topicId, + clientEventNamespace = Some(uuaTopicFollowClientEventNamespace1), + actionType = ActionType.ClientTopicFollow + ) + val events = Table( + ("itemType", "expectedUUA"), + (Some(ItemType.Tweet), Seq(expected)), + (Some(ItemType.QuotedTweet), Seq(expected)), + (Some(ItemType.Topic), Seq(expected)), + (None, Nil) + ) + + forEvery(events) { (itemTypeOpt: Option[ItemType], expected: Seq[UnifiedUserAction]) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceTopicFollow1), + itemId = None, + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(homeTweetControllerData()))), + itemTypeOpt = itemTypeOpt + )) + assert(expected === actual) + } + } + } + } + + // Tests for ClientTweetRenderImpression + test("ClientTweetRenderImpression") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "Default", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceRenderEventNamespace)), + Seq(expectedTweetRenderDefaultTweetUUA)), + ( + "Reply", + actionTowardReplyEvent(eventNamespace = Some(ceRenderEventNamespace)), + Seq(expectedTweetRenderReplyUUA)), + ( + "Retweet", + actionTowardRetweetEvent(eventNamespace = Some(ceRenderEventNamespace)), + Seq(expectedTweetRenderRetweetUUA)), + ( + "Quote", + actionTowardQuoteEvent( + eventNamespace = Some(ceRenderEventNamespace), + quotedAuthorId = Some(456L)), + Seq(expectedTweetRenderQuoteUUA1, expectedTweetRenderQuoteUUA2)), + ( + "Retweet of a reply that quoted another Tweet", + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = + Some(ceRenderEventNamespace)), + Seq( + expectedTweetRenderRetweetWithReplyAndQuoteUUA1, + expectedTweetRenderRetweetWithReplyAndQuoteUUA2)) + ) + forEvery(clientEvents) { + (_: String, event: LogEvent, expectedUUA: Seq[UnifiedUserAction]) => + val actual = ClientEventAdapter.adaptEvent(event) + actual should contain theSameElementsAs expectedUUA + } + } + } + } + + test("ClientTweetGallery/DetailImpression") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "DetailImpression: tweet::tweet::impression", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceTweetDetailsEventNamespace1)), + expectedTweetDetailImpressionUUA1), + ( + "GalleryImpression: gallery:photo:impression", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceGalleryEventNamespace)), + expectedTweetGalleryImpressionUUA), + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetLingerImpression + test("ClientTweetLingerImpression") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ("Default", lingerDefaultTweetEvent, expectedTweetLingerDefaultTweetUUA), + ("Reply", lingerReplyEvent, expectedTweetLingerReplyUUA), + ("Retweet", lingerRetweetEvent, expectedTweetLingerRetweetUUA), + ("Quote", lingerQuoteEvent, expectedTweetLingerQuoteUUA), + ( + "Retweet of a reply that quoted another Tweet", + lingerRetweetWithReplyAndQuoteEvent, + expectedTweetLingerRetweetWithReplyAndQuoteUUA), + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetClickQuote + test( + "ClickQuote, which is the click on the quote button, results in setting retweeting, inReplyTo, quoted tweet ids") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = ClientEventAdapter.adaptEvent( + // there shouldn't be any quotingTweetId in CE when it is "quote" + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = Some( + EventNamespace( + action = Some("quote") + )))) + assert(Seq(expectedTweetClickQuoteUUA) === actual) + } + } + } + + // Tests for ClientTweetQuote + test( + "Quote, which is sending the quote, results in setting retweeting, inReplyTo, quoted tweet ids") { + new TestFixtures.ClientEventFixture { + val actions: TableFor1[String] = Table( + "action", + "send_quote_tweet", + "retweet_with_comment" + ) + + Time.withTimeAt(frozenTime) { _ => + forEvery(actions) { action => + val actual = ClientEventAdapter.adaptEvent( + // there shouldn't be any quotingTweetId in CE when it is "quote" + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = Some( + EventNamespace( + action = Some(action) + )))) + assert(Seq(expectedTweetQuoteUUA(action)) === actual) + } + } + } + } + + // Tests for ClientTweetFav and ClientTweetUnfav + test("ClientTweetFav and ClientTweetUnfav") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "Default Tweet favorite", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceFavoriteEventNamespace)), + expectedTweetFavoriteDefaultTweetUUA), + ( + "Reply Tweet favorite", + actionTowardReplyEvent(eventNamespace = Some(ceFavoriteEventNamespace)), + expectedTweetFavoriteReplyUUA), + ( + "Retweet Tweet favorite", + actionTowardRetweetEvent(eventNamespace = Some(ceFavoriteEventNamespace)), + expectedTweetFavoriteRetweetUUA), + ( + "Quote Tweet favorite", + actionTowardQuoteEvent(eventNamespace = Some(ceFavoriteEventNamespace)), + expectedTweetFavoriteQuoteUUA), + ( + "Retweet of a reply that quoted another Tweet favorite", + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = + Some(ceFavoriteEventNamespace)), + expectedTweetFavoriteRetweetWithReplyAndQuoteUUA), + ( + "Default Tweet unfavorite", + actionTowardDefaultTweetEvent( + eventNamespace = Some(EventNamespace(action = Some("unfavorite"))), + ), + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(ClientEventNamespace(action = Some("unfavorite"))), + actionType = ActionType.ClientTweetUnfav + )) + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetClickReply + test("ClientTweetClickReply") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "Default", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceClickReplyEventNamespace)), + expectedTweetClickReplyDefaultTweetUUA), + ( + "Reply", + actionTowardReplyEvent(eventNamespace = Some(ceClickReplyEventNamespace)), + expectedTweetClickReplyReplyUUA), + ( + "Retweet", + actionTowardRetweetEvent(eventNamespace = Some(ceClickReplyEventNamespace)), + expectedTweetClickReplyRetweetUUA), + ( + "Quote", + actionTowardQuoteEvent(eventNamespace = Some(ceClickReplyEventNamespace)), + expectedTweetClickReplyQuoteUUA), + ( + "Retweet of a reply that quoted another Tweet", + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = + Some(ceClickReplyEventNamespace)), + expectedTweetClickReplyRetweetWithReplyAndQuoteUUA) + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetReply + test("ClientTweetReply") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ("DefaultOrReply", replyToDefaultTweetOrReplyEvent, expectedTweetReplyDefaultTweetUUA), + ("Retweet", replyToRetweetEvent, expectedTweetReplyRetweetUUA), + ("Quote", replyToQuoteEvent, expectedTweetReplyQuoteUUA), + ( + "Retweet of a reply that quoted another Tweet", + replyToRetweetWithReplyAndQuoteEvent, + expectedTweetReplyRetweetWithReplyAndQuoteUUA) + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetRetweet and ClientTweetUnretweet + test("ClientTweetRetweet and ClientTweetUnretweet") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "Default Tweet retweet", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceRetweetEventNamespace)), + expectedTweetRetweetDefaultTweetUUA), + ( + "Reply Tweet retweet", + actionTowardReplyEvent(eventNamespace = Some(ceRetweetEventNamespace)), + expectedTweetRetweetReplyUUA), + ( + "Retweet Tweet retweet", + actionTowardRetweetEvent(eventNamespace = Some(ceRetweetEventNamespace)), + expectedTweetRetweetRetweetUUA), + ( + "Quote Tweet retweet", + actionTowardQuoteEvent(eventNamespace = Some(ceRetweetEventNamespace)), + expectedTweetRetweetQuoteUUA), + ( + "Retweet of a reply that quoted another Tweet retweet", + actionTowardRetweetEventWithReplyAndQuote(eventNamespace = + Some(ceRetweetEventNamespace)), + expectedTweetRetweetRetweetWithReplyAndQuoteUUA), + ( + "Default Tweet unretweet", + actionTowardDefaultTweetEvent( + eventNamespace = Some(EventNamespace(action = Some("unretweet"))), + ), + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(ClientEventNamespace(action = Some("unretweet"))), + actionType = ActionType.ClientTweetUnretweet + )) + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + test("include Topic Id") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = ClientEventAdapter.adaptEvent(renderDefaultTweetWithTopicIdEvent) + assert(Seq(expectedTweetRenderDefaultTweetWithTopicIdUUA) === actual) + } + } + } + + // Tests for ClientTweetVideoPlayback0, 25, 50, 75, 95, 100 PlayFromTap, QualityView, + // VideoView, MrcView, ViewThreshold + test("ClientTweetVideoPlayback*") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("ceNamespace", "uuaNamespace", "uuaActionType"), + ( + ceVideoPlayback25, + uuaVideoPlayback25ClientEventNamespace, + ActionType.ClientTweetVideoPlayback25), + ( + ceVideoPlayback50, + uuaVideoPlayback50ClientEventNamespace, + ActionType.ClientTweetVideoPlayback50), + ( + ceVideoPlayback75, + uuaVideoPlayback75ClientEventNamespace, + ActionType.ClientTweetVideoPlayback75), + ( + ceVideoPlayback95, + uuaVideoPlayback95ClientEventNamespace, + ActionType.ClientTweetVideoPlayback95), + ( + ceVideoPlayFromTap, + uuaVideoPlayFromTapClientEventNamespace, + ActionType.ClientTweetVideoPlayFromTap), + ( + ceVideoQualityView, + uuaVideoQualityViewClientEventNamespace, + ActionType.ClientTweetVideoQualityView), + (ceVideoView, uuaVideoViewClientEventNamespace, ActionType.ClientTweetVideoView), + (ceVideoMrcView, uuaVideoMrcViewClientEventNamespace, ActionType.ClientTweetVideoMrcView), + ( + ceVideoViewThreshold, + uuaVideoViewThresholdClientEventNamespace, + ActionType.ClientTweetVideoViewThreshold), + ( + ceVideoCtaUrlClick, + uuaVideoCtaUrlClickClientEventNamespace, + ActionType.ClientTweetVideoCtaUrlClick), + ( + ceVideoCtaWatchClick, + uuaVideoCtaWatchClickClientEventNamespace, + ActionType.ClientTweetVideoCtaWatchClick), + ) + + for (element <- videoEventElementValues) { + forEvery(clientEvents) { + ( + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val event = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace.copy(element = Some(element))), + mediaDetailsV2 = Some(mediaDetailsV2), + clientMediaEvent = Some(clientMediaEvent), + cardDetails = Some(cardDetails) + ) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaNamespace.copy(element = Some(element))), + actionType = uuaActionType, + tweetActionInfo = Some(videoMetadata) + ) + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + } + + // Tests for ClientTweetPhotoExpand + test("Client Tweet Photo Expand") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvent = actionTowardDefaultTweetEvent(eventNamespace = Some(cePhotoExpand)) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaPhotoExpandClientEventNamespace), + actionType = ActionType.ClientTweetPhotoExpand + ) + assert(Seq(expectedUUA) === ClientEventAdapter.adaptEvent(clientEvent)) + } + } + } + + // Tests for ClientCardClick + test("Client Card Related") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("ceNamespace", "ceItemType", "uuaNamespace", "uuaActionType"), + ( + ceCardClick, + ItemType.Tweet, + uuaCardClickClientEventNamespace, + ActionType.ClientCardClick), + ( + ceCardClick, + ItemType.User, + uuaCardClickClientEventNamespace, + ActionType.ClientCardClick), + ( + ceCardOpenApp, + ItemType.Tweet, + uuaCardOpenAppClientEventNamespace, + ActionType.ClientCardOpenApp), + ( + ceCardAppInstallAttempt, + ItemType.Tweet, + uuaCardAppInstallAttemptClientEventNamespace, + ActionType.ClientCardAppInstallAttempt), + ( + cePollCardVote1, + ItemType.Tweet, + uuaPollCardVote1ClientEventNamespace, + ActionType.ClientPollCardVote), + ( + cePollCardVote2, + ItemType.Tweet, + uuaPollCardVote2ClientEventNamespace, + ActionType.ClientPollCardVote), + ) + forEvery(clientEvents) { + ( + ceNamespace: EventNamespace, + ceItemType: ItemType, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val event = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace), + itemTypeOpt = Some(ceItemType), + authorId = Some(authorId) + ) + val expectedUUA = mkExpectedUUAForCardEvent( + id = Some(itemTweetId), + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType, + itemType = Some(ceItemType), + authorId = Some(authorId) + ) + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetClickMentionScreenName + test("ClientTweetClickMentionScreenName") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val userHandle = "someHandle" + val clientEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceMentionClick), + targets = Some( + Seq( + LogEventItem( + itemType = Some(ItemType.User), + id = Some(userId), + name = Some(userHandle))))) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaMentionClickClientEventNamespace), + actionType = ActionType.ClientTweetClickMentionScreenName, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetClickMentionScreenName( + ClientTweetClickMentionScreenName(actionProfileId = userId, handle = userHandle))) + ) + assert(Seq(expectedUUA) === ClientEventAdapter.adaptEvent(clientEvent)) + } + } + } + + // Tests for Topic Follow/Unfollow actions + test("Topic Follow/Unfollow Actions") { + // The Topic Id is mostly from TimelineTopic controller data or HomeTweets controller data! + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("clientEventNamesapce", "expectedUUANamespace", "controllerData", "actionType"), + ( + ceTopicFollow1, + uuaTopicFollowClientEventNamespace1, + timelineTopicControllerData(), + ActionType.ClientTopicFollow + ), + ( + ceTopicFollow1, + uuaTopicFollowClientEventNamespace1, + homeTweetControllerData(), + ActionType.ClientTopicFollow), + ( + ceTopicFollow2, + uuaTopicFollowClientEventNamespace2, + timelineTopicControllerData(), + ActionType.ClientTopicFollow + ), + ( + ceTopicFollow2, + uuaTopicFollowClientEventNamespace2, + homeTweetControllerData(), + ActionType.ClientTopicFollow), + ( + ceTopicFollow3, + uuaTopicFollowClientEventNamespace3, + timelineTopicControllerData(), + ActionType.ClientTopicFollow + ), + ( + ceTopicFollow3, + uuaTopicFollowClientEventNamespace3, + homeTweetControllerData(), + ActionType.ClientTopicFollow), + ( + ceTopicUnfollow1, + uuaTopicUnfollowClientEventNamespace1, + timelineTopicControllerData(), + ActionType.ClientTopicUnfollow + ), + ( + ceTopicUnfollow1, + uuaTopicUnfollowClientEventNamespace1, + homeTweetControllerData(), + ActionType.ClientTopicUnfollow), + ( + ceTopicUnfollow2, + uuaTopicUnfollowClientEventNamespace2, + timelineTopicControllerData(), + ActionType.ClientTopicUnfollow + ), + ( + ceTopicFollow2, + uuaTopicFollowClientEventNamespace2, + homeTweetControllerData(), + ActionType.ClientTopicFollow), + ( + ceTopicUnfollow3, + uuaTopicUnfollowClientEventNamespace3, + timelineTopicControllerData(), + ActionType.ClientTopicUnfollow + ), + ( + ceTopicUnfollow3, + uuaTopicUnfollowClientEventNamespace3, + homeTweetControllerData(), + ActionType.ClientTopicUnfollow), + ) + + forEvery(clientEvents) { + ( + eventNamespace: EventNamespace, + uuaNs: ClientEventNamespace, + controllerData: ControllerData, + actionType: ActionType + ) => + val event = actionTowardDefaultTweetEvent( + eventNamespace = Some(eventNamespace), + itemId = None, + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(controllerData))) + ) + val expectedUUA = mkExpectedUUAForActionTowardTopicEvent( + topicId = topicId, + traceId = None, + clientEventNamespace = Some(uuaNs), + actionType = actionType + ) + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for Topic NotInterestedIn & its Undo actions + test("Topic NotInterestedIn & its Undo actions") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("clientEventNamesapce", "expectedUUANamespace", "controllerData", "actionType"), + ( + ceTopicNotInterestedIn1, + uuaTopicNotInterestedInClientEventNamespace1, + timelineTopicControllerData(), + ActionType.ClientTopicNotInterestedIn + ), + ( + ceTopicNotInterestedIn1, + uuaTopicNotInterestedInClientEventNamespace1, + homeTweetControllerData(), + ActionType.ClientTopicNotInterestedIn), + ( + ceTopicNotInterestedIn2, + uuaTopicNotInterestedInClientEventNamespace2, + timelineTopicControllerData(), + ActionType.ClientTopicNotInterestedIn + ), + ( + ceTopicNotInterestedIn2, + uuaTopicNotInterestedInClientEventNamespace2, + homeTweetControllerData(), + ActionType.ClientTopicNotInterestedIn), + ( + ceTopicUndoNotInterestedIn1, + uuaTopicUndoNotInterestedInClientEventNamespace1, + timelineTopicControllerData(), + ActionType.ClientTopicUndoNotInterestedIn + ), + ( + ceTopicUndoNotInterestedIn1, + uuaTopicUndoNotInterestedInClientEventNamespace1, + homeTweetControllerData(), + ActionType.ClientTopicUndoNotInterestedIn), + ( + ceTopicUndoNotInterestedIn2, + uuaTopicUndoNotInterestedInClientEventNamespace2, + timelineTopicControllerData(), + ActionType.ClientTopicUndoNotInterestedIn + ), + ( + ceTopicUndoNotInterestedIn2, + uuaTopicUndoNotInterestedInClientEventNamespace2, + homeTweetControllerData(), + ActionType.ClientTopicUndoNotInterestedIn), + ) + + forEvery(clientEvents) { + ( + eventNamespace: EventNamespace, + uuaNs: ClientEventNamespace, + controllerData: ControllerData, + actionType: ActionType + ) => + val event = actionTowardDefaultTweetEvent( + eventNamespace = Some(eventNamespace), + itemId = None, + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(controllerData))) + ) + val expectedUUA = mkExpectedUUAForActionTowardTopicEvent( + topicId = topicId, + traceId = None, + clientEventNamespace = Some(uuaNs), + actionType = actionType + ) + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for authorInfo + test("authorInfo") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("authorIdOpt", "isFollowedByActingUser", "isFollowingActingUser"), + (Some(authorId), true, false), + (Some(authorId), true, true), + (Some(authorId), false, true), + (Some(authorId), false, false), + (None, true, true), + ) + forEvery(clientEvents) { + ( + authorIdOpt: Option[Long], + isFollowedByActingUser: Boolean, + isFollowingActingUser: Boolean + ) => + val actual = ClientEventAdapter.adaptEvent( + renderDefaultTweetUserFollowStatusEvent( + authorId = authorIdOpt, + isFollowedByActingUser = isFollowedByActingUser, + isFollowingActingUser = isFollowingActingUser + )) + val expected = + expectedTweetRenderDefaultTweetWithAuthorInfoUUA(authorInfo = authorIdOpt.map { id => + AuthorInfo( + authorId = Some(id), + isFollowedByActingUser = Some(isFollowedByActingUser), + isFollowingActingUser = Some(isFollowingActingUser) + ) + }) + assert(Seq(expected) === actual) + } + } + } + } + + // Tests for ClientTweetReport + test("ClientTweetReport") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val ceNTabTweetReport: EventNamespace = + ceTweetReport.copy(page = Some("ntab"), section = Some("all"), component = Some("urt")) + + val uuaNTabTweetReport: ClientEventNamespace = + uuaTweetReport.copy(page = Some("ntab"), section = Some("all"), component = Some("urt")) + + val params = Table( + ( + "eventType", + "ceNamespace", + "ceNotificationTabDetails", + "ceReportDetails", + "uuaNamespace", + "uuaTweetActionInfo", + "uuaProductSurface", + "uuaProductSurfaceInfo"), + ( + "ntabReportTweetClick", + ceNTabTweetReport.copy(action = Some("click")), + Some(notificationTabTweetEventDetails), + None, + uuaNTabTweetReport.copy(action = Some("click")), + reportTweetClick, + Some(ProductSurface.NotificationTab), + Some(notificationTabProductSurfaceInfo) + ), + ( + "ntabReportTweetDone", + ceNTabTweetReport.copy(action = Some("done")), + Some(notificationTabTweetEventDetails), + None, + uuaNTabTweetReport.copy(action = Some("done")), + reportTweetDone, + Some(ProductSurface.NotificationTab), + Some(notificationTabProductSurfaceInfo) + ), + ( + "defaultReportTweetDone", + ceTweetReport.copy(page = Some("tweet"), action = Some("done")), + None, + None, + uuaTweetReport.copy(page = Some("tweet"), action = Some("done")), + reportTweetDone, + None, + None + ), + ( + "defaultReportTweetWithReportFlowId", + ceTweetReport.copy(page = Some("tweet"), action = Some("done")), + None, + Some(ReportDetails(reportFlowId = Some(reportFlowId))), + uuaTweetReport.copy(page = Some("tweet"), action = Some("done")), + reportTweetWithReportFlowId, + None, + None + ), + ( + "defaultReportTweetWithoutReportFlowId", + ceTweetReport.copy(page = Some("tweet"), action = Some("done")), + None, + None, + uuaTweetReport.copy(page = Some("tweet"), action = Some("done")), + reportTweetWithoutReportFlowId, + None, + None + ), + ) + + forEvery(params) { + ( + _: String, + ceNamespace: EventNamespace, + ceNotificationTabDetails: Option[NotificationTabDetails], + ceReportDetails: Option[ReportDetails], + uuaNamespace: ClientEventNamespace, + uuaTweetActionInfo: TweetActionInfo, + productSurface: Option[ProductSurface], + productSurfaceInfo: Option[ProductSurfaceInfo] + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace), + notificationTabDetails = ceNotificationTabDetails, + reportDetails = ceReportDetails)) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaNamespace), + actionType = ActionType.ClientTweetReport, + tweetActionInfo = Some(uuaTweetActionInfo), + productSurface = productSurface, + productSurfaceInfo = productSurfaceInfo + ) + + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetNotHelpful and ClientTweetUndoNotHelpful + test("ClientTweetNotHelpful & UndoNotHelpful") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_givefeedback" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetNotHelpful + case "undo" => ActionType.ClientTweetUndoNotHelpful + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetNotInterestedIn and ClientTweetUndoNotInterestedIn + test("ClientTweetNotInterestedIn & UndoNotInterestedIn") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_dontlike" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetNotInterestedIn + case "undo" => ActionType.ClientTweetUndoNotInterestedIn + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetNotAboutTopic & ClientTweetUndoNotAboutTopic + test("ClientTweetNotAboutTopic & ClientTweetUndoNotAboutTopic") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_notabouttopic" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetNotAboutTopic + case "undo" => ActionType.ClientTweetUndoNotAboutTopic + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetNotRecent and ClientTweetUndoNotRecent + test("ClientTweetNotRecent & UndoNotRecent") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_notrecent" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetNotRecent + case "undo" => ActionType.ClientTweetUndoNotRecent + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetSeeFewer and ClientTweetUndoSeeFewer + test("ClientTweetSeeFewer & ClientTweetUndoSeeFewer") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_seefewer" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetSeeFewer + case "undo" => ActionType.ClientTweetUndoSeeFewer + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for getEventMetadata + test("getEventMetadata") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("clientEventNamesapce", "expectedUUANamespace", "controllerData"), + ( + ceRenderEventNamespace, + uuaRenderClientEventNamespace, + homeTweetControllerData() + ), + ) + + forEvery(clientEvents) { + ( + eventNamespace: EventNamespace, + uuaNs: ClientEventNamespace, + controllerData: ControllerData + ) => + val event = actionTowardDefaultTweetEvent( + eventNamespace = Some(eventNamespace), + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(controllerData))) + ) + val expectedEventMetaData = mkUUAEventMetadata( + clientEventNamespace = Some(uuaNs) + ) + val actual = ClientEventAdapter.adaptEvent(event).head.eventMetadata + assert(expectedEventMetaData === actual) + } + } + } + } + + // Tests for getSourceTimestamp + test("getSourceTimestamp") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val params = Table( + ("testCase", "clientEvent", "expectedUUAEventTimestamp"), + ( + "CES event with DriftAdjustedEventCreatedAtMs", + actionTowardDefaultTweetEvent(eventNamespace = Some(ceRenderEventNamespace)), + logBase.driftAdjustedEventCreatedAtMs), + ( + "CES event without DriftAdjustedEventCreatedAtMs: ignore", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceRenderEventNamespace), + logBase = logBase.unsetDriftAdjustedEventCreatedAtMs), + None), + ( + "Non-CES event without DriftAdjustedEventCreatedAtMs: use logBase.timestamp", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceRenderEventNamespace), + logBase = logBase + .copy( + clientEventReceiver = + Some(ClientEventReceiver.Unknown)).unsetDriftAdjustedEventCreatedAtMs + ), + Some(logBase.timestamp)) + ) + forEvery(params) { (_: String, event: LogEvent, expectedUUAEventTimestamp: Option[Long]) => + val actual = + ClientEventAdapter.adaptEvent(event).map(_.eventMetadata.sourceTimestampMs).headOption + assert(expectedUUAEventTimestamp === actual) + } + } + } + } + + // Tests for ServerTweetReport + test("ServerTweetReport") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val params = Table( + ("eventType", "ceNamespace", "ceReportDetails", "uuaNamespace", "uuaTweetActionInfo"), + ( + "ReportImpressionIsNotAdapted", + ceTweetReportFlow(page = "report_abuse", action = "impression"), + Some(ReportDetails(reportFlowId = Some(reportFlowId))), + None, + None + ), + ( + "ReportSubmitIsAdapted", + ceTweetReportFlow(page = "report_abuse", action = "submit"), + Some( + ReportDetails( + reportFlowId = Some(reportFlowId), + reportType = Some(ReportType.Abuse))), + Some(uuaTweetReportFlow(page = "report_abuse", action = "submit")), + Some(reportTweetSubmit) + ), + ) + + forEvery(params) { + ( + _: String, + ceNamespace: EventNamespace, + ceReportDetails: Option[ReportDetails], + uuaNamespace: Option[ClientEventNamespace], + uuaTweetActionInfo: Option[TweetActionInfo] + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace), + reportDetails = ceReportDetails)) + + val expectedUUA = + if (ceNamespace.action.contains("submit")) + Seq( + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = uuaNamespace, + actionType = ActionType.ServerTweetReport, + tweetActionInfo = uuaTweetActionInfo + )) + else Nil + + assert(expectedUUA === actual) + } + } + } + } + + // Tests for ClientNotificationOpen + test("ClientNotificationOpen") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvent = + pushNotificationEvent( + eventNamespace = Some(ceNotificationOpen), + notificationDetails = Some(notificationDetails)) + + val expectedUUA = mkExpectedUUAForNotificationEvent( + clientEventNamespace = Some(uuaNotificationOpen), + actionType = ActionType.ClientNotificationOpen, + notificationContent = tweetNotificationContent, + productSurface = Some(ProductSurface.PushNotification), + productSurfaceInfo = Some( + ProductSurfaceInfo.PushNotificationInfo( + PushNotificationInfo(notificationId = notificationId))) + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientNotificationClick + test("ClientNotificationClick") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val params = Table( + ("notificationType", "ceNotificationTabDetails", "uuaNotificationContent"), + ("tweetNotification", notificationTabTweetEventDetails, tweetNotificationContent), + ( + "multiTweetNotification", + notificationTabMultiTweetEventDetails, + multiTweetNotificationContent), + ( + "unknownNotification", + notificationTabUnknownEventDetails, + unknownNotificationContent + ), + ) + + forEvery(params) { + ( + _: String, + ceNotificationTabDetails: NotificationTabDetails, + uuaNotificationContent: NotificationContent + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardNotificationEvent( + eventNamespace = Some(ceNotificationClick), + notificationTabDetails = Some(ceNotificationTabDetails))) + + val expectedUUA = mkExpectedUUAForNotificationEvent( + clientEventNamespace = Some(uuaNotificationClick), + actionType = ActionType.ClientNotificationClick, + notificationContent = uuaNotificationContent, + productSurface = Some(ProductSurface.NotificationTab), + productSurfaceInfo = Some(notificationTabProductSurfaceInfo) + ) + + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientNotificationSeeLessOften + test("ClientNotificationSeeLessOften") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val params = Table( + ("notificationType", "ceNotificationTabDetails", "uuaNotificationContent"), + ("tweetNotification", notificationTabTweetEventDetails, tweetNotificationContent), + ( + "multiTweetNotification", + notificationTabMultiTweetEventDetails, + multiTweetNotificationContent), + ("unknownNotification", notificationTabUnknownEventDetails, unknownNotificationContent), + ) + + forEvery(params) { + ( + _: String, + ceNotificationTabDetails: NotificationTabDetails, + uuaNotificationContent: NotificationContent + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardNotificationEvent( + eventNamespace = Some(ceNotificationSeeLessOften), + notificationTabDetails = Some(ceNotificationTabDetails))) + + val expectedUUA = mkExpectedUUAForNotificationEvent( + clientEventNamespace = Some(uuaNotificationSeeLessOften), + actionType = ActionType.ClientNotificationSeeLessOften, + notificationContent = uuaNotificationContent, + productSurface = Some(ProductSurface.NotificationTab), + productSurfaceInfo = Some(notificationTabProductSurfaceInfo) + ) + + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetClick + test("ClientTweetClick") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val params = Table( + ("eventName", "page", "nTabDetails", "uuaProductSurface", "uuaProductSurfaceInfo"), + ("tweetClick", "messages", None, None, None), + ( + "tweetClickInNTab", + "ntab", + Some(notificationTabTweetEventDetails), + Some(ProductSurface.NotificationTab), + Some(notificationTabProductSurfaceInfo)) + ) + + forEvery(params) { + ( + _: String, + page: String, + notificationTabDetails: Option[NotificationTabDetails], + uuaProductSurface: Option[ProductSurface], + uuaProductSurfaceInfo: Option[ProductSurfaceInfo] + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceTweetClick.copy(page = Some(page))), + notificationTabDetails = notificationTabDetails)) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaTweetClick.copy(page = Some(page))), + actionType = ActionType.ClientTweetClick, + productSurface = uuaProductSurface, + productSurfaceInfo = uuaProductSurfaceInfo + ) + + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetClickProfile + test("ClientTweetClickProfile") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = + ClientEventAdapter.adaptEvent( + profileClickEvent(eventNamespace = Some(ceTweetClickProfile))) + + val expectedUUA = mkExpectedUUAForProfileClick( + clientEventNamespace = Some(uuaTweetClickProfile), + actionType = ActionType.ClientTweetClickProfile, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + ))) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientTweetClickShare + test("ClientTweetClickShare") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = + ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(EventNamespace(action = Some("share_menu_click"))), + authorId = Some(authorId), + tweetPosition = Some(1), + promotedId = Some("promted_123") + )) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(ClientEventNamespace(action = Some("share_menu_click"))), + actionType = ActionType.ClientTweetClickShare, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + )), + tweetPosition = Some(1), + promotedId = Some("promted_123") + ) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientTweetShareVia* and ClientTweetUnbookmark + test("ClientTweetShareVia and Unbookmark") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("eventNamespaceAction", "uuaActionTypes"), + ("bookmark", Seq(ActionType.ClientTweetShareViaBookmark, ActionType.ClientTweetBookmark)), + ("copy_link", Seq(ActionType.ClientTweetShareViaCopyLink)), + ("share_via_dm", Seq(ActionType.ClientTweetClickSendViaDirectMessage)), + ("unbookmark", Seq(ActionType.ClientTweetUnbookmark)) + ) + + forEvery(input) { (eventNamespaceAction: String, uuaActionTypes: Seq[ActionType]) => + val actual: Seq[UnifiedUserAction] = + ClientEventAdapter.adaptEvent( + actionTowardDefaultTweetEvent( + eventNamespace = Some(EventNamespace(action = Some(eventNamespaceAction))), + authorId = Some(authorId))) + + implicit def any2iterable[A](a: A): Iterable[A] = Some(a) + val expectedUUA: Seq[UnifiedUserAction] = uuaActionTypes.flatMap { uuaActionType => + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = + Some(ClientEventNamespace(action = Some(eventNamespaceAction))), + actionType = uuaActionType, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + )) + ) + } + assert(expectedUUA === actual) + } + } + } + } + + // Test for ClientTweetClickHashtag + test("ClientTweetClickHashtag") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val events = Table( + ("targets", "tweetActionInfo"), + ( + Some(Seq(LogEventItem(name = Some("test_hashtag")))), + Some( + TweetActionInfo.ClientTweetClickHashtag( + ClientTweetClickHashtag(hashtag = Some("test_hashtag"))))), + ( + Some(Seq.empty[LogEventItem]), + Some(TweetActionInfo.ClientTweetClickHashtag(ClientTweetClickHashtag(hashtag = None)))), + ( + Some(Nil), + Some(TweetActionInfo.ClientTweetClickHashtag(ClientTweetClickHashtag(hashtag = None)))), + ( + None, + Some(TweetActionInfo.ClientTweetClickHashtag(ClientTweetClickHashtag(hashtag = None)))) + ) + forEvery(events) { + (targets: Option[Seq[LogEventItem]], tweetActionInfo: Option[TweetActionInfo]) => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceClickHashtag), + targets = targets) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClickHashtagClientEventNamespace), + actionType = ActionType.ClientTweetClickHashtag, + tweetActionInfo = tweetActionInfo + ) + assert(Seq(expectedUUA) === ClientEventAdapter.adaptEvent(clientEvent)) + } + + } + } + } + + // Tests for ClientTweetVideoPlaybackStart and ClientTweetVideoPlaybackComplete + test("Client Tweet Video Playback Start and Complete") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("ceNamespace", "uuaNamespace", "uuaActionType"), + ( + ceVideoPlaybackStart, + uuaVideoPlaybackStartClientEventNamespace, + ActionType.ClientTweetVideoPlaybackStart), + ( + ceVideoPlaybackComplete, + uuaVideoPlaybackCompleteClientEventNamespace, + ActionType.ClientTweetVideoPlaybackComplete), + ) + for (element <- videoEventElementValues) { + forEvery(input) { + ( + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val clientEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace.copy(element = Some(element))), + mediaDetailsV2 = Some(mediaDetailsV2), + clientMediaEvent = Some(clientMediaEvent), + cardDetails = Some(cardDetails) + ) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaNamespace.copy(element = Some(element))), + actionType = uuaActionType, + tweetActionInfo = Some(videoMetadata) + ) + assert(ClientEventAdapter.adaptEvent(clientEvent).contains(expectedUUA)) + } + } + + for (element <- invalidVideoEventElementValues) { + forEvery(input) { + ( + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val clientEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceNamespace.copy(element = Some(element))), + mediaDetailsV2 = Some(mediaDetailsV2), + clientMediaEvent = Some(clientMediaEvent) + ) + val unexpectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaNamespace.copy(element = Some(element))), + actionType = uuaActionType, + tweetActionInfo = Some(videoMetadata) + ) + assert(!ClientEventAdapter.adaptEvent(clientEvent).contains(unexpectedUUA)) + } + } + } + } + } + + // Tests for ClientTweetNotRelevant and ClientTweetUndoNotRelevant + test("ClientTweetNotRelevant & UndoNotRelevant") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actions = Table(("action"), "click", "undo") + val element = "feedback_notrelevant" + forEvery(actions) { action => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceEventNamespace(element, action)), + ) + + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClientEventNamespace(element, action)), + actionType = action match { + case "click" => ActionType.ClientTweetNotRelevant + case "undo" => ActionType.ClientTweetUndoNotRelevant + } + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientNotificationDismiss + test("ClientNotificationDismiss") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvent = + pushNotificationEvent( + eventNamespace = Some(ceNotificationDismiss), + notificationDetails = Some(notificationDetails)) + + val expectedUUA = mkExpectedUUAForNotificationEvent( + clientEventNamespace = Some(uuaNotificationDismiss), + actionType = ActionType.ClientNotificationDismiss, + notificationContent = tweetNotificationContent, + productSurface = Some(ProductSurface.PushNotification), + productSurfaceInfo = Some( + ProductSurfaceInfo.PushNotificationInfo( + PushNotificationInfo(notificationId = notificationId))) + ) + + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientTypeaheadClick + test("ClientTypeaheadClick") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val searchQuery = "searchQuery" + + val input = Table( + ("clientEventTargets", "typeaheadActionInfo"), + ( + Some(Seq(LogEventItem(id = Some(userId), itemType = Some(ItemType.User)))), + TypeaheadActionInfo.UserResult(UserResult(profileId = userId))), + ( + Some(Seq(LogEventItem(name = Some(s"$searchQuery"), itemType = Some(ItemType.Search)))), + TypeaheadActionInfo.TopicQueryResult( + TopicQueryResult(suggestedTopicQuery = s"$searchQuery"))) + ) + forEvery(input) { + ( + clientEventTargets: Option[Seq[LogEventItem]], + typeaheadActionInfo: TypeaheadActionInfo, + ) => + val clientEvent = + actionTowardsTypeaheadEvent( + eventNamespace = Some(ceTypeaheadClick), + targets = clientEventTargets, + searchQuery = searchQuery) + val expectedUUA = mkExpectedUUAForTypeaheadAction( + clientEventNamespace = Some(uuaTypeaheadClick), + actionType = ActionType.ClientTypeaheadClick, + typeaheadActionInfo = typeaheadActionInfo, + searchQuery = searchQuery + ) + val actual = ClientEventAdapter.adaptEvent(clientEvent) + assert(Seq(expectedUUA) === actual) + } + // Testing invalid target item type case + assert( + Seq() === ClientEventAdapter.adaptEvent( + actionTowardsTypeaheadEvent( + eventNamespace = Some(ceTypeaheadClick), + targets = + Some(Seq(LogEventItem(id = Some(itemTweetId), itemType = Some(ItemType.Tweet)))), + searchQuery = searchQuery))) + } + } + } + + // Tests for ClientFeedbackPromptSubmit + test("ClientFeedbackPromptSubmit") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val searchQuery: String = "searchQuery" + val searchDetails = Some(SearchDetails(query = Some(searchQuery))) + val input = Table( + ("logEvent", "uuaNamespace", "uuaActionType", "FeedbackPromptInfo"), + ( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceTweetRelevantToSearch), + searchDetails = searchDetails + ), + uuaTweetRelevantToSearch, + ActionType.ClientFeedbackPromptSubmit, + FeedbackPromptInfo(feedbackPromptActionInfo = + FeedbackPromptActionInfo.TweetRelevantToSearch( + TweetRelevantToSearch( + searchQuery = searchQuery, + tweetId = itemTweetId, + isRelevant = Some(true))))), + ( + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceTweetNotRelevantToSearch), + searchDetails = searchDetails + ), + uuaTweetNotRelevantToSearch, + ActionType.ClientFeedbackPromptSubmit, + FeedbackPromptInfo(feedbackPromptActionInfo = + FeedbackPromptActionInfo.TweetRelevantToSearch( + TweetRelevantToSearch( + searchQuery = searchQuery, + tweetId = itemTweetId, + isRelevant = Some(false))))), + ( + actionTowardSearchResultPageEvent( + eventNamespace = Some(ceSearchResultsRelevant), + searchDetails = searchDetails, + items = Some(Seq(LogEventItem(itemType = Some(ItemType.RelevancePrompt)))) + ), + uuaSearchResultsRelevant, + ActionType.ClientFeedbackPromptSubmit, + FeedbackPromptInfo(feedbackPromptActionInfo = + FeedbackPromptActionInfo.DidYouFindItSearch( + DidYouFindItSearch(searchQuery = searchQuery, isRelevant = Some(true))))), + ( + actionTowardSearchResultPageEvent( + eventNamespace = Some(ceSearchResultsNotRelevant), + searchDetails = searchDetails, + items = Some(Seq(LogEventItem(itemType = Some(ItemType.RelevancePrompt)))) + ), + uuaSearchResultsNotRelevant, + ActionType.ClientFeedbackPromptSubmit, + FeedbackPromptInfo(feedbackPromptActionInfo = + FeedbackPromptActionInfo.DidYouFindItSearch( + DidYouFindItSearch(searchQuery = searchQuery, isRelevant = Some(false))))) + ) + + forEvery(input) { + ( + logEvent: LogEvent, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType, + feedbackPromptInfo: FeedbackPromptInfo + ) => + val actual = + ClientEventAdapter.adaptEvent(logEvent) + val expectedUUA = mkExpectedUUAForFeedbackSubmitAction( + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType, + feedbackPromptInfo = feedbackPromptInfo, + searchQuery = searchQuery) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientProfile* + test("ClientProfile*") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("eventName", "ceNamespace", "uuaNamespace", "uuaActionType"), + ("profile_block", ceProfileBlock, uuaProfileBlock, ActionType.ClientProfileBlock), + ("profile_unblock", ceProfileUnblock, uuaProfileUnblock, ActionType.ClientProfileUnblock), + ("profile_mute", ceProfileMute, uuaProfileMute, ActionType.ClientProfileMute), + ("profile_report", ceProfileReport, uuaProfileReport, ActionType.ClientProfileReport), + ("profile_follow", ceProfileFollow, uuaProfileFollow, ActionType.ClientProfileFollow), + ("profile_click", ceProfileClick, uuaProfileClick, ActionType.ClientProfileClick), + ( + "profile_follow_attempt", + ceProfileFollowAttempt, + uuaProfileFollowAttempt, + ActionType.ClientProfileFollowAttempt), + ("profile_show", ceProfileShow, uuaProfileShow, ActionType.ClientProfileShow), + ) + forEvery(input) { + ( + eventName: String, + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val actual = + ClientEventAdapter.adaptEvent( + actionTowardProfileEvent( + eventName = eventName, + eventNamespace = Some(ceNamespace) + )) + val expectedUUA = mkExpectedUUAForProfileAction( + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType, + actionProfileId = itemProfileId) + assert(Seq(expectedUUA) === actual) + } + } + } + } + // Tests for ClientTweetEngagementAttempt + test("ClientTweetEngagementAttempt") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("eventName", "ceNamespace", "uuaNamespace", "uuaActionType"), + ( + "tweet_favourite_attempt", + ceTweetFavoriteAttempt, + uuaTweetFavoriteAttempt, + ActionType.ClientTweetFavoriteAttempt), + ( + "tweet_retweet_attempt", + ceTweetRetweetAttempt, + uuaTweetRetweetAttempt, + ActionType.ClientTweetRetweetAttempt), + ( + "tweet_reply_attempt", + ceTweetReplyAttempt, + uuaTweetReplyAttempt, + ActionType.ClientTweetReplyAttempt), + ) + forEvery(clientEvents) { + ( + eventName: String, + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val actual = + ClientEventAdapter.adaptEvent(actionTowardDefaultTweetEvent(Some(ceNamespace))) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for LoggedOut for ClientLogin* + test("ClientLogin*") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("eventName", "ceNamespace", "uuaNamespace", "uuaActionType"), + ( + "client_click_login", + ceClientCTALoginClick, + uuaClientCTALoginClick, + ActionType.ClientCTALoginClick), + ( + "client_click_show", + ceClientCTALoginStart, + uuaClientCTALoginStart, + ActionType.ClientCTALoginStart), + ( + "client_login_success", + ceClientCTALoginSuccess, + uuaClientCTALoginSuccess, + ActionType.ClientCTALoginSuccess), + ) + + forEvery(clientEvents) { + ( + eventName: String, + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val actual = + ClientEventAdapter.adaptEvent( + mkLogEvent( + eventName, + Some(ceNamespace), + logBase = Some(logBase1), + eventDetails = None, + pushNotificationDetails = None, + reportDetails = None, + searchDetails = None)) + val expectedUUA = mkExpectedUUAForActionTowardCTAEvent( + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType, + guestIdMarketingOpt = logBase1.guestIdMarketing + ) + + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for LoggedOut for ClientSignup* + test("ClientSignup*") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("eventName", "ceNamespace", "uuaNamespace", "uuaActionType"), + ( + "client_click_signup", + ceClientCTASignupClick, + uuaClientCTASignupClick, + ActionType.ClientCTASignupClick), + ( + "client_signup_success", + ceClientCTASignupSuccess, + uuaClientCTASignupSuccess, + ActionType.ClientCTASignupSuccess), + ) + + forEvery(clientEvents) { + ( + eventName: String, + ceNamespace: EventNamespace, + uuaNamespace: ClientEventNamespace, + uuaActionType: ActionType + ) => + val actual = + ClientEventAdapter.adaptEvent( + mkLogEvent( + eventName, + Some(ceNamespace), + logBase = Some(logBase1), + eventDetails = None, + pushNotificationDetails = None, + reportDetails = None, + searchDetails = None)) + val expectedUUA = mkExpectedUUAForActionTowardCTAEvent( + clientEventNamespace = Some(uuaNamespace), + actionType = uuaActionType, + guestIdMarketingOpt = logBase1.guestIdMarketing + ) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetFollowAuthor + test("ClientTweetFollowAuthor") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val testEventsList = Seq( + (ceTweetFollowAuthor1, uuaTweetFollowAuthor1, TweetAuthorFollowClickSource.CaretMenu), + (ceTweetFollowAuthor2, uuaTweetFollowAuthor2, TweetAuthorFollowClickSource.ProfileImage) + ) + testEventsList.foreach { + case (eventNamespace, clientEventNamespace, followClickSource) => + val actual = + ClientEventAdapter.adaptEvent( + tweetActionTowardAuthorEvent( + eventName = "tweet_follow_author", + eventNamespace = Some(eventNamespace) + )) + val expectedUUA = mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace = Some(clientEventNamespace), + actionType = ActionType.ClientTweetFollowAuthor, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + )), + tweetActionInfo = Some( + TweetActionInfo.ClientTweetFollowAuthor( + ClientTweetFollowAuthor(followClickSource) + )) + ) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetUnfollowAuthor + test("ClientTweetUnfollowAuthor") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val testEventsList = Seq( + ( + ceTweetUnfollowAuthor1, + uuaTweetUnfollowAuthor1, + TweetAuthorUnfollowClickSource.CaretMenu), + ( + ceTweetUnfollowAuthor2, + uuaTweetUnfollowAuthor2, + TweetAuthorUnfollowClickSource.ProfileImage) + ) + testEventsList.foreach { + case (eventNamespace, clientEventNamespace, unfollowClickSource) => + val actual = + ClientEventAdapter.adaptEvent( + tweetActionTowardAuthorEvent( + eventName = "tweet_unfollow_author", + eventNamespace = Some(eventNamespace) + )) + val expectedUUA = mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace = Some(clientEventNamespace), + actionType = ActionType.ClientTweetUnfollowAuthor, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + )), + tweetActionInfo = Some( + TweetActionInfo.ClientTweetUnfollowAuthor( + ClientTweetUnfollowAuthor(unfollowClickSource) + )) + ) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + // Tests for ClientTweetMuteAuthor + test("ClientTweetMuteAuthor") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = + ClientEventAdapter.adaptEvent( + tweetActionTowardAuthorEvent( + eventName = "tweet_mute_author", + eventNamespace = Some(ceTweetMuteAuthor) + )) + + val expectedUUA = mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace = Some(uuaTweetMuteAuthor), + actionType = ActionType.ClientTweetMuteAuthor, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + ))) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientTweetBlockAuthor + test("ClientTweetBlockAuthor") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = + ClientEventAdapter.adaptEvent( + tweetActionTowardAuthorEvent( + eventName = "tweet_block_author", + eventNamespace = Some(ceTweetBlockAuthor) + )) + + val expectedUUA = mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace = Some(uuaTweetBlockAuthor), + actionType = ActionType.ClientTweetBlockAuthor, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + ))) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Tests for ClientTweetUnblockAuthor + test("ClientTweetUnblockAuthor") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = + ClientEventAdapter.adaptEvent( + tweetActionTowardAuthorEvent( + eventName = "tweet_unblock_author", + eventNamespace = Some(ceTweetUnblockAuthor) + )) + + val expectedUUA = mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace = Some(uuaTweetUnblockAuthor), + actionType = ActionType.ClientTweetUnblockAuthor, + authorInfo = Some( + AuthorInfo( + authorId = Some(authorId) + ))) + assert(Seq(expectedUUA) === actual) + } + } + } + + // Test for ClientTweetOpenLink + test("ClientTweetOpenLink") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val input = Table( + ("url", "tweetActionInfo"), + (Some("go/url"), clientOpenLinkWithUrl), + (None, clientOpenLinkWithoutUrl) + ) + + forEvery(input) { (url: Option[String], tweetActionInfo: TweetActionInfo) => + val clientEvent = + actionTowardDefaultTweetEvent(eventNamespace = Some(ceOpenLink), url = url) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaOpenLinkClientEventNamespace), + actionType = ActionType.ClientTweetOpenLink, + tweetActionInfo = Some(tweetActionInfo) + ) + assert(Seq(expectedUUA) === ClientEventAdapter.adaptEvent(clientEvent)) + } + } + } + } + + // Test for ClientTweetTakeScreenshot + test("Client take screenshot") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvent = + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceTakeScreenshot), + percentVisibleHeight100k = Some(100)) + val expectedUUA = mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaTakeScreenshotClientEventNamespace), + actionType = ActionType.ClientTweetTakeScreenshot, + tweetActionInfo = Some(clientTakeScreenshot) + ) + assert(Seq(expectedUUA) === ClientEventAdapter.adaptEvent(clientEvent)) + } + } + } + + test("Home / Search product surface meta data") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val clientEvents = Table( + ("actionTweetType", "clientEvent", "expectedUUAEvent"), + ( + "homeTweetEventWithControllerData", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails(decodedControllerData = Some( + homeTweetControllerDataV2( + injectedPosition = Some(1), + traceId = Some(traceId), + requestJoinId = Some(requestJoinId) + )))) + ), + expectedHomeTweetEventWithControllerData), + ( + "homeTweetEventWithSuggestionType", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails( + suggestionType = Some("Test_type") + ))), + expectedHomeTweetEventWithSuggestType), + ( + "homeTweetEventWithControllerDataSuggestionType", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails( + suggestionType = Some("Test_type"), + decodedControllerData = Some( + homeTweetControllerDataV2( + injectedPosition = Some(1), + traceId = Some(traceId), + requestJoinId = Some(requestJoinId))) + )) + ), + expectedHomeTweetEventWithControllerDataSuggestType), + ( + "homeLatestTweetEventWithControllerData", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeLatestFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails(decodedControllerData = Some( + homeTweetControllerDataV2( + injectedPosition = Some(1), + traceId = Some(traceId), + requestJoinId = Some(requestJoinId) + )))) + ), + expectedHomeLatestTweetEventWithControllerData), + ( + "homeLatestTweetEventWithSuggestionType", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeLatestFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails( + suggestionType = Some("Test_type") + ))), + expectedHomeLatestTweetEventWithSuggestType), + ( + "homeLatestTweetEventWithControllerDataSuggestionType", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceHomeLatestFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails( + suggestionType = Some("Test_type"), + decodedControllerData = Some( + homeTweetControllerDataV2( + injectedPosition = Some(1), + traceId = Some(traceId), + requestJoinId = Some(requestJoinId))) + )) + ), + expectedHomeLatestTweetEventWithControllerDataSuggestType), + ( + "searchTweetEventWithControllerData", + actionTowardDefaultTweetEvent( + eventNamespace = Some(ceSearchFavoriteEventNamespace), + suggestionDetails = Some( + SuggestionDetails(decodedControllerData = Some( + mkSearchResultControllerData( + queryOpt = Some("twitter"), + traceId = Some(traceId), + requestJoinId = Some(requestJoinId) + )))) + ), + expectedSearchTweetEventWithControllerData), + ) + forEvery(clientEvents) { (_: String, event: LogEvent, expectedUUA: UnifiedUserAction) => + val actual = ClientEventAdapter.adaptEvent(event) + assert(Seq(expectedUUA) === actual) + } + } + } + } + + test("ClientAppExit") { + new TestFixtures.ClientEventFixture { + Time.withTimeAt(frozenTime) { _ => + val duration: Option[Long] = Some(10000L) + val inputTable = Table( + ("eventType", "clientAppId", "section", "duration", "isValidEvent"), + ("uas-iPhone", Some(129032L), Some("enter_background"), duration, true), + ("uas-iPad", Some(191841L), Some("enter_background"), duration, true), + ("uas-android", Some(258901L), None, duration, true), + ("none-clientId", None, None, duration, false), + ("invalid-clientId", Some(1L), None, duration, false), + ("none-duration", Some(258901L), None, None, false), + ("non-uas-iPhone", Some(129032L), None, duration, false) + ) + + forEvery(inputTable) { + ( + _: String, + clientAppId: Option[Long], + section: Option[String], + duration: Option[Long], + isValidEvent: Boolean + ) => + val actual = ClientEventAdapter.adaptEvent( + actionTowardsUasEvent( + eventNamespace = Some(ceAppExit.copy(section = section)), + clientAppId = clientAppId, + duration = duration + )) + + if (isValidEvent) { + // create UUA UAS event + val expectedUUA = mkExpectedUUAForUasEvent( + clientEventNamespace = Some(uuaAppExit.copy(section = section)), + actionType = ActionType.ClientAppExit, + clientAppId = clientAppId, + duration = duration + ) + assert(Seq(expectedUUA) === actual) + } else { + // ignore the event and do not create UUA UAS event + assert(actual.isEmpty) + } + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventAdapterSpec.scala new file mode 100644 index 000000000..5d00f0d8b --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventAdapterSpec.scala @@ -0,0 +1,20 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.TestFixtures.EmailNotificationEventFixture +import com.twitter.unified_user_actions.adapter.email_notification_event.EmailNotificationEventAdapter +import com.twitter.util.Time + +class EmailNotificationEventAdapterSpec extends Test { + + test("Notifications with click event") { + new EmailNotificationEventFixture { + Time.withTimeAt(frozenTime) { _ => + val actual = EmailNotificationEventAdapter.adaptEvent(notificationEvent) + assert(expectedUua == actual.head) + assert(EmailNotificationEventAdapter.adaptEvent(notificationEventWOTweetId).isEmpty) + assert(EmailNotificationEventAdapter.adaptEvent(notificationEventWOImpressionId).isEmpty) + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventUtilsSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventUtilsSpec.scala new file mode 100644 index 000000000..b99dc892c --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/EmailNotificationEventUtilsSpec.scala @@ -0,0 +1,32 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.TestFixtures.EmailNotificationEventFixture +import com.twitter.unified_user_actions.adapter.email_notification_event.EmailNotificationEventUtils + +class EmailNotificationEventUtilsSpec extends Test { + + test("Extract TweetId from pageUrl") { + new EmailNotificationEventFixture { + + val invalidUrls: Seq[String] = + List("", "abc.com/what/not?x=y", "?abc=def", "12345/", "12345/?") + val invalidDomain = "https://twitter.app.link/addressbook/" + val numericHandle = + "https://twitter.com/1234/status/12345?cxt=HBwWgsDTgY3tp&cn=ZmxleGl&refsrc=email)" + + assert(EmailNotificationEventUtils.extractTweetId(pageUrlStatus).contains(tweetIdStatus)) + assert(EmailNotificationEventUtils.extractTweetId(pageUrlEvent).contains(tweetIdEvent)) + assert(EmailNotificationEventUtils.extractTweetId(pageUrlNoArgs).contains(tweetIdNoArgs)) + assert(EmailNotificationEventUtils.extractTweetId(invalidDomain).isEmpty) + assert(EmailNotificationEventUtils.extractTweetId(numericHandle).contains(12345L)) + invalidUrls.foreach(url => assert(EmailNotificationEventUtils.extractTweetId(url).isEmpty)) + } + } + + test("Extract TweetId from LogBase") { + new EmailNotificationEventFixture { + assert(EmailNotificationEventUtils.extractTweetId(logBase1).contains(tweetIdStatus)) + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/FavoriteArchivalEventsAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/FavoriteArchivalEventsAdapterSpec.scala new file mode 100644 index 000000000..69e670172 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/FavoriteArchivalEventsAdapterSpec.scala @@ -0,0 +1,132 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.timelineservice.fanout.thriftscala.FavoriteArchivalEvent +import com.twitter.unified_user_actions.adapter.favorite_archival_events.FavoriteArchivalEventsAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks + +class FavoriteArchivalEventsAdapterSpec extends Test with TableDrivenPropertyChecks { + trait Fixture { + + val frozenTime = Time.fromMilliseconds(1658949273000L) + + val userId = 1L + val authorId = 2L + val tweetId = 101L + val retweetId = 102L + + val favArchivalEventNoRetweet = FavoriteArchivalEvent( + favoriterId = userId, + tweetId = tweetId, + timestampMs = 0L, + isArchivingAction = Some(true), + tweetUserId = Some(authorId) + ) + val favArchivalEventRetweet = FavoriteArchivalEvent( + favoriterId = userId, + tweetId = retweetId, + timestampMs = 0L, + isArchivingAction = Some(true), + tweetUserId = Some(authorId), + sourceTweetId = Some(tweetId) + ) + val favUnarchivalEventNoRetweet = FavoriteArchivalEvent( + favoriterId = userId, + tweetId = tweetId, + timestampMs = 0L, + isArchivingAction = Some(false), + tweetUserId = Some(authorId) + ) + val favUnarchivalEventRetweet = FavoriteArchivalEvent( + favoriterId = userId, + tweetId = retweetId, + timestampMs = 0L, + isArchivingAction = Some(false), + tweetUserId = Some(authorId), + sourceTweetId = Some(tweetId) + ) + + val expectedUua1 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + ) + ), + actionType = ActionType.ServerTweetArchiveFavorite, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerFavoriteArchivalEvents, + ) + ) + val expectedUua2 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = retweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + retweetedTweetId = Some(tweetId) + ) + ), + actionType = ActionType.ServerTweetArchiveFavorite, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerFavoriteArchivalEvents, + ) + ) + val expectedUua3 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + ) + ), + actionType = ActionType.ServerTweetUnarchiveFavorite, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerFavoriteArchivalEvents, + ) + ) + val expectedUua4 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = retweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + retweetedTweetId = Some(tweetId) + ) + ), + actionType = ActionType.ServerTweetUnarchiveFavorite, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerFavoriteArchivalEvents, + ) + ) + } + + test("all tests") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val table = Table( + ("event", "expected"), + (favArchivalEventNoRetweet, expectedUua1), + (favArchivalEventRetweet, expectedUua2), + (favUnarchivalEventNoRetweet, expectedUua3), + (favUnarchivalEventRetweet, expectedUua4) + ) + forEvery(table) { (event: FavoriteArchivalEvent, expected: UnifiedUserAction) => + val actual = FavoriteArchivalEventsAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RekeyUuaFromInteractionEventsAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RekeyUuaFromInteractionEventsAdapterSpec.scala new file mode 100644 index 000000000..93b741b79 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RekeyUuaFromInteractionEventsAdapterSpec.scala @@ -0,0 +1,36 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.TestFixtures.InteractionEventsFixtures +import com.twitter.unified_user_actions.adapter.uua_aggregates.RekeyUuaFromInteractionEventsAdapter +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks + +class RekeyUuaFromInteractionEventsAdapterSpec extends Test with TableDrivenPropertyChecks { + test("ClientTweetRenderImpressions") { + new InteractionEventsFixtures { + Time.withTimeAt(frozenTime) { _ => + assert( + RekeyUuaFromInteractionEventsAdapter.adaptEvent(baseInteractionEvent) === Seq( + expectedBaseKeyedUuaTweet)) + } + } + } + + test("Filter out logged out users") { + new InteractionEventsFixtures { + Time.withTimeAt(frozenTime) { _ => + assert(RekeyUuaFromInteractionEventsAdapter.adaptEvent(loggedOutInteractionEvent) === Nil) + } + } + } + + test("Filter out detail impressions") { + new InteractionEventsFixtures { + Time.withTimeAt(frozenTime) { _ => + assert( + RekeyUuaFromInteractionEventsAdapter.adaptEvent(detailImpressionInteractionEvent) === Nil) + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RetweetArchivalEventsAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RetweetArchivalEventsAdapterSpec.scala new file mode 100644 index 000000000..00a78b535 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/RetweetArchivalEventsAdapterSpec.scala @@ -0,0 +1,86 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.tweetypie.thriftscala.RetweetArchivalEvent +import com.twitter.unified_user_actions.adapter.retweet_archival_events.RetweetArchivalEventsAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks + +class RetweetArchivalEventsAdapterSpec extends Test with TableDrivenPropertyChecks { + trait Fixture { + + val frozenTime = Time.fromMilliseconds(1658949273000L) + + val authorId = 1L + val tweetId = 101L + val retweetId = 102L + val retweetAuthorId = 2L + + val retweetArchivalEvent = RetweetArchivalEvent( + retweetId = retweetId, + srcTweetId = tweetId, + retweetUserId = retweetAuthorId, + srcTweetUserId = authorId, + timestampMs = 0L, + isArchivingAction = Some(true), + ) + val retweetUnarchivalEvent = RetweetArchivalEvent( + retweetId = retweetId, + srcTweetId = tweetId, + retweetUserId = retweetAuthorId, + srcTweetUserId = authorId, + timestampMs = 0L, + isArchivingAction = Some(false), + ) + + val expectedUua1 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(retweetAuthorId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + retweetingTweetId = Some(retweetId) + ) + ), + actionType = ActionType.ServerTweetArchiveRetweet, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerRetweetArchivalEvents, + ) + ) + val expectedUua2 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(retweetAuthorId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = tweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(authorId))), + retweetingTweetId = Some(retweetId) + ) + ), + actionType = ActionType.ServerTweetUnarchiveRetweet, + eventMetadata = EventMetadata( + sourceTimestampMs = 0L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerRetweetArchivalEvents, + ) + ) + } + + test("all tests") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val table = Table( + ("event", "expected"), + (retweetArchivalEvent, expectedUua1), + (retweetUnarchivalEvent, expectedUua2), + ) + forEvery(table) { (event: RetweetArchivalEvent, expected: UnifiedUserAction) => + val actual = RetweetArchivalEventsAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SearchInfoUtilsSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SearchInfoUtilsSpec.scala new file mode 100644 index 000000000..6746b3099 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SearchInfoUtilsSpec.scala @@ -0,0 +1,355 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.clientapp.thriftscala.SuggestionDetails +import com.twitter.clientapp.thriftscala._ +import com.twitter.search.common.constants.thriftscala.ThriftQuerySource +import com.twitter.search.common.constants.thriftscala.TweetResultSource +import com.twitter.search.common.constants.thriftscala.UserResultSource +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData +import com.twitter.suggests.controller_data.search_response.request.thriftscala.RequestControllerData +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerData +import com.twitter.suggests.controller_data.search_response.tweet_types.thriftscala.TweetTypesControllerData +import com.twitter.suggests.controller_data.search_response.user_types.thriftscala.UserTypesControllerData +import com.twitter.suggests.controller_data.search_response.v1.thriftscala.{ + SearchResponseControllerData => SearchResponseControllerDataV1 +} +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.suggests.controller_data.v2.thriftscala.{ControllerData => ControllerDataV2} +import com.twitter.util.mock.Mockito +import org.junit.runner.RunWith +import org.scalatest.funsuite.AnyFunSuite +import org.scalatest.matchers.should.Matchers +import org.scalatest.prop.TableDrivenPropertyChecks +import org.scalatestplus.junit.JUnitRunner +import com.twitter.unified_user_actions.adapter.client_event.SearchInfoUtils +import com.twitter.unified_user_actions.thriftscala.SearchQueryFilterType +import com.twitter.unified_user_actions.thriftscala.SearchQueryFilterType._ +import org.scalatest.prop.TableFor2 + +@RunWith(classOf[JUnitRunner]) +class SearchInfoUtilsSpec + extends AnyFunSuite + with Matchers + with Mockito + with TableDrivenPropertyChecks { + + trait Fixture { + def mkControllerData( + queryOpt: Option[String], + querySourceOpt: Option[Int] = None, + traceId: Option[Long] = None, + requestJoinId: Option[Long] = None + ): ControllerData = { + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1(requestControllerData = Some( + RequestControllerData( + rawQuery = queryOpt, + querySource = querySourceOpt, + traceId = traceId, + requestJoinId = requestJoinId + ))) + ))) + } + + def mkTweetTypeControllerData(bitmap: Long, topicId: Option[Long] = None): ControllerData.V2 = { + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1(itemTypesControllerData = Some( + ItemTypesControllerData.TweetTypesControllerData( + TweetTypesControllerData( + tweetTypesBitmap = Some(bitmap), + topicId = topicId + )) + )) + ))) + } + + def mkUserTypeControllerData(bitmap: Long): ControllerData.V2 = { + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1(itemTypesControllerData = Some( + ItemTypesControllerData.UserTypesControllerData(UserTypesControllerData( + userTypesBitmap = Some(bitmap) + )) + )) + ))) + } + } + + test("getQueryOptFromControllerDataFromItem should return query if present in controller data") { + new Fixture { + + val controllerData: ControllerData = mkControllerData(Some("twitter")) + val suggestionDetails: SuggestionDetails = + SuggestionDetails(decodedControllerData = Some(controllerData)) + val item: Item = Item(suggestionDetails = Some(suggestionDetails)) + val result: Option[String] = new SearchInfoUtils(item).getQueryOptFromControllerDataFromItem + result shouldEqual Option("twitter") + } + } + + test("getRequestJoinId should return requestJoinId if present in controller data") { + new Fixture { + + val controllerData: ControllerData = mkControllerData( + Some("twitter"), + traceId = Some(11L), + requestJoinId = Some(12L) + ) + val suggestionDetails: SuggestionDetails = + SuggestionDetails(decodedControllerData = Some(controllerData)) + val item: Item = Item(suggestionDetails = Some(suggestionDetails)) + val infoUtils = new SearchInfoUtils(item) + infoUtils.getTraceId shouldEqual Some(11L) + infoUtils.getRequestJoinId shouldEqual Some(12L) + } + } + + test("getQueryOptFromControllerDataFromItem should return None if no suggestion details") { + new Fixture { + + val suggestionDetails: SuggestionDetails = SuggestionDetails() + val item: Item = Item(suggestionDetails = Some(suggestionDetails)) + val result: Option[String] = new SearchInfoUtils(item).getQueryOptFromControllerDataFromItem + result shouldEqual None + } + } + + test("getQueryOptFromSearchDetails should return query if present") { + new Fixture { + + val searchDetails: SearchDetails = SearchDetails(query = Some("twitter")) + val result: Option[String] = new SearchInfoUtils(Item()).getQueryOptFromSearchDetails( + LogEvent(eventName = "", searchDetails = Some(searchDetails)) + ) + result shouldEqual Option("twitter") + } + } + + test("getQueryOptFromSearchDetails should return None if not present") { + new Fixture { + + val searchDetails: SearchDetails = SearchDetails() + val result: Option[String] = new SearchInfoUtils(Item()).getQueryOptFromSearchDetails( + LogEvent(eventName = "", searchDetails = Some(searchDetails)) + ) + result shouldEqual None + } + } + + test("getQuerySourceOptFromControllerDataFromItem should return QuerySource if present") { + new Fixture { + + // 1 is Typed Query + val controllerData: ControllerData = mkControllerData(Some("twitter"), Some(1)) + + val item: Item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + new SearchInfoUtils(item).getQuerySourceOptFromControllerDataFromItem shouldEqual Some( + ThriftQuerySource.TypedQuery) + } + } + + test("getQuerySourceOptFromControllerDataFromItem should return None if not present") { + new Fixture { + + val controllerData: ControllerData = mkControllerData(Some("twitter"), None) + + val item: Item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + new SearchInfoUtils(item).getQuerySourceOptFromControllerDataFromItem shouldEqual None + } + } + + test("Decoding Tweet Result Sources bitmap") { + new Fixture { + + TweetResultSource.list + .foreach { tweetResultSource => + val bitmap = (1 << tweetResultSource.getValue()).toLong + val controllerData = mkTweetTypeControllerData(bitmap) + + val item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + + val result = new SearchInfoUtils(item).getTweetResultSources + result shouldEqual Some(Set(tweetResultSource)) + } + } + } + + test("Decoding multiple Tweet Result Sources") { + new Fixture { + + val tweetResultSources: Set[TweetResultSource] = + Set(TweetResultSource.QueryInteractionGraph, TweetResultSource.QueryExpansion) + val bitmap: Long = tweetResultSources.foldLeft(0L) { + case (acc, source) => acc + (1 << source.getValue()) + } + + val controllerData: ControllerData.V2 = mkTweetTypeControllerData(bitmap) + + val item: Item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + + val result: Option[Set[TweetResultSource]] = new SearchInfoUtils(item).getTweetResultSources + result shouldEqual Some(tweetResultSources) + } + } + + test("Decoding User Result Sources bitmap") { + new Fixture { + + UserResultSource.list + .foreach { userResultSource => + val bitmap = (1 << userResultSource.getValue()).toLong + val controllerData = mkUserTypeControllerData(bitmap) + + val item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + + val result = new SearchInfoUtils(item).getUserResultSources + result shouldEqual Some(Set(userResultSource)) + } + } + } + + test("Decoding multiple User Result Sources") { + new Fixture { + + val userResultSources: Set[UserResultSource] = + Set(UserResultSource.QueryInteractionGraph, UserResultSource.ExpertSearch) + val bitmap: Long = userResultSources.foldLeft(0L) { + case (acc, source) => acc + (1 << source.getValue()) + } + + val controllerData: ControllerData.V2 = mkUserTypeControllerData(bitmap) + + val item: Item = Item( + suggestionDetails = Some( + SuggestionDetails( + decodedControllerData = Some(controllerData) + )) + ) + + val result: Option[Set[UserResultSource]] = new SearchInfoUtils(item).getUserResultSources + result shouldEqual Some(userResultSources) + } + } + + test("getQueryFilterTabType should return correct query filter type") { + new Fixture { + val infoUtils = new SearchInfoUtils(Item()) + val eventsToBeChecked: TableFor2[Option[EventNamespace], Option[SearchQueryFilterType]] = + Table( + ("eventNamespace", "queryFilterType"), + ( + Some(EventNamespace(client = Some("m5"), element = Some("search_filter_top"))), + Some(Top)), + ( + Some(EventNamespace(client = Some("m5"), element = Some("search_filter_live"))), + Some(Latest)), + ( + Some(EventNamespace(client = Some("m5"), element = Some("search_filter_user"))), + Some(People)), + ( + Some(EventNamespace(client = Some("m5"), element = Some("search_filter_image"))), + Some(Photos)), + ( + Some(EventNamespace(client = Some("m5"), element = Some("search_filter_video"))), + Some(Videos)), + ( + Some(EventNamespace(client = Some("m5"), section = Some("search_filter_top"))), + None + ), // if client is web, element determines the query filter hence None if element is None + ( + Some(EventNamespace(client = Some("android"), element = Some("search_filter_top"))), + Some(Top)), + ( + Some(EventNamespace(client = Some("android"), element = Some("search_filter_tweets"))), + Some(Latest)), + ( + Some(EventNamespace(client = Some("android"), element = Some("search_filter_user"))), + Some(People)), + ( + Some(EventNamespace(client = Some("android"), element = Some("search_filter_image"))), + Some(Photos)), + ( + Some(EventNamespace(client = Some("android"), element = Some("search_filter_video"))), + Some(Videos)), + ( + Some(EventNamespace(client = Some("m5"), section = Some("search_filter_top"))), + None + ), // if client is android, element determines the query filter hence None if element is None + ( + Some(EventNamespace(client = Some("iphone"), section = Some("search_filter_top"))), + Some(Top)), + ( + Some(EventNamespace(client = Some("iphone"), section = Some("search_filter_live"))), + Some(Latest)), + ( + Some(EventNamespace(client = Some("iphone"), section = Some("search_filter_user"))), + Some(People)), + ( + Some(EventNamespace(client = Some("iphone"), section = Some("search_filter_image"))), + Some(Photos)), + ( + Some(EventNamespace(client = Some("iphone"), section = Some("search_filter_video"))), + Some(Videos)), + ( + Some(EventNamespace(client = Some("iphone"), element = Some("search_filter_top"))), + None + ), // if client is iphone, section determines the query filter hence None if section is None + ( + Some(EventNamespace(client = None, section = Some("search_filter_top"))), + Some(Top) + ), // if client is missing, use section by default + ( + Some(EventNamespace(client = None, element = Some("search_filter_top"))), + None + ), // if client is missing, section is used by default hence None since section is missing + ( + Some(EventNamespace(client = Some("iphone"))), + None + ), // if both element and section missing, expect None + (None, None), // if namespace is missing from LogEvent, expect None + ) + + forEvery(eventsToBeChecked) { + ( + eventNamespace: Option[EventNamespace], + searchQueryFilterType: Option[SearchQueryFilterType] + ) => + infoUtils.getQueryFilterType( + LogEvent( + eventName = "srp_event", + eventNamespace = eventNamespace)) shouldEqual searchQueryFilterType + } + + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SocialGraphAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SocialGraphAdapterSpec.scala new file mode 100644 index 000000000..168700045 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/SocialGraphAdapterSpec.scala @@ -0,0 +1,359 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.socialgraph.thriftscala.Action +import com.twitter.socialgraph.thriftscala.BlockGraphEvent +import com.twitter.socialgraph.thriftscala.FollowGraphEvent +import com.twitter.socialgraph.thriftscala.FollowRequestGraphEvent +import com.twitter.socialgraph.thriftscala.FollowRetweetsGraphEvent +import com.twitter.socialgraph.thriftscala.LogEventContext +import com.twitter.socialgraph.thriftscala.MuteGraphEvent +import com.twitter.socialgraph.thriftscala.ReportAsAbuseGraphEvent +import com.twitter.socialgraph.thriftscala.ReportAsSpamGraphEvent +import com.twitter.socialgraph.thriftscala.SrcTargetRequest +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.socialgraph.thriftscala.WriteRequestResult +import com.twitter.unified_user_actions.adapter.social_graph_event.SocialGraphAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks +import org.scalatest.prop.TableFor1 +import org.scalatest.prop.TableFor3 + +class SocialGraphAdapterSpec extends Test with TableDrivenPropertyChecks { + trait Fixture { + + val frozenTime: Time = Time.fromMilliseconds(1658949273000L) + + val testLogEventContext: LogEventContext = LogEventContext( + timestamp = 1001L, + hostname = "", + transactionId = "", + socialGraphClientId = "", + loggedInUserId = Some(1111L), + ) + + val testWriteRequestResult: WriteRequestResult = WriteRequestResult( + request = SrcTargetRequest( + source = 1111L, + target = 2222L + ) + ) + + val testWriteRequestResultWithValidationError: WriteRequestResult = WriteRequestResult( + request = SrcTargetRequest( + source = 1111L, + target = 2222L + ), + validationError = Some("action unsuccessful") + ) + + val baseEvent: WriteEvent = WriteEvent( + context = testLogEventContext, + action = Action.AcceptFollowRequest + ) + + val sgFollowEvent: WriteEvent = baseEvent.copy( + action = Action.Follow, + follow = Some(List(FollowGraphEvent(testWriteRequestResult)))) + + val sgUnfollowEvent: WriteEvent = baseEvent.copy( + action = Action.Unfollow, + follow = Some(List(FollowGraphEvent(testWriteRequestResult)))) + + val sgFollowRedundantEvent: WriteEvent = baseEvent.copy( + action = Action.Follow, + follow = Some( + List( + FollowGraphEvent( + result = testWriteRequestResult, + redundantOperation = Some(true) + )))) + + val sgFollowRedundantIsFalseEvent: WriteEvent = baseEvent.copy( + action = Action.Follow, + follow = Some( + List( + FollowGraphEvent( + result = testWriteRequestResult, + redundantOperation = Some(false) + )))) + + val sgUnfollowRedundantEvent: WriteEvent = baseEvent.copy( + action = Action.Unfollow, + follow = Some( + List( + FollowGraphEvent( + result = testWriteRequestResult, + redundantOperation = Some(true) + )))) + + val sgUnfollowRedundantIsFalseEvent: WriteEvent = baseEvent.copy( + action = Action.Unfollow, + follow = Some( + List( + FollowGraphEvent( + result = testWriteRequestResult, + redundantOperation = Some(false) + )))) + + val sgUnsuccessfulFollowEvent: WriteEvent = baseEvent.copy( + action = Action.Follow, + follow = Some(List(FollowGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgUnsuccessfulUnfollowEvent: WriteEvent = baseEvent.copy( + action = Action.Unfollow, + follow = Some(List(FollowGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgBlockEvent: WriteEvent = baseEvent.copy( + action = Action.Block, + block = Some(List(BlockGraphEvent(testWriteRequestResult)))) + + val sgUnsuccessfulBlockEvent: WriteEvent = baseEvent.copy( + action = Action.Block, + block = Some(List(BlockGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgUnblockEvent: WriteEvent = baseEvent.copy( + action = Action.Unblock, + block = Some(List(BlockGraphEvent(testWriteRequestResult)))) + + val sgUnsuccessfulUnblockEvent: WriteEvent = baseEvent.copy( + action = Action.Unblock, + block = Some(List(BlockGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgMuteEvent: WriteEvent = baseEvent.copy( + action = Action.Mute, + mute = Some(List(MuteGraphEvent(testWriteRequestResult)))) + + val sgUnsuccessfulMuteEvent: WriteEvent = baseEvent.copy( + action = Action.Mute, + mute = Some(List(MuteGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgUnmuteEvent: WriteEvent = baseEvent.copy( + action = Action.Unmute, + mute = Some(List(MuteGraphEvent(testWriteRequestResult)))) + + val sgUnsuccessfulUnmuteEvent: WriteEvent = baseEvent.copy( + action = Action.Unmute, + mute = Some(List(MuteGraphEvent(testWriteRequestResultWithValidationError)))) + + val sgCreateFollowRequestEvent: WriteEvent = baseEvent.copy( + action = Action.CreateFollowRequest, + followRequest = Some(List(FollowRequestGraphEvent(testWriteRequestResult))) + ) + + val sgCancelFollowRequestEvent: WriteEvent = baseEvent.copy( + action = Action.CancelFollowRequest, + followRequest = Some(List(FollowRequestGraphEvent(testWriteRequestResult))) + ) + + val sgAcceptFollowRequestEvent: WriteEvent = baseEvent.copy( + action = Action.AcceptFollowRequest, + followRequest = Some(List(FollowRequestGraphEvent(testWriteRequestResult))) + ) + + val sgAcceptFollowRetweetEvent: WriteEvent = baseEvent.copy( + action = Action.FollowRetweets, + followRetweets = Some(List(FollowRetweetsGraphEvent(testWriteRequestResult))) + ) + + val sgAcceptUnfollowRetweetEvent: WriteEvent = baseEvent.copy( + action = Action.UnfollowRetweets, + followRetweets = Some(List(FollowRetweetsGraphEvent(testWriteRequestResult))) + ) + + val sgReportAsSpamEvent: WriteEvent = baseEvent.copy( + action = Action.ReportAsSpam, + reportAsSpam = Some( + List( + ReportAsSpamGraphEvent( + result = testWriteRequestResult + )))) + + val sgReportAsAbuseEvent: WriteEvent = baseEvent.copy( + action = Action.ReportAsAbuse, + reportAsAbuse = Some( + List( + ReportAsAbuseGraphEvent( + result = testWriteRequestResult + )))) + + def getExpectedUUA( + userId: Long, + actionProfileId: Long, + sourceTimestampMs: Long, + actionType: ActionType, + socialGraphAction: Option[Action] = None + ): UnifiedUserAction = { + val actionItem = socialGraphAction match { + case Some(sgAction) => + Item.ProfileInfo( + ProfileInfo( + actionProfileId = actionProfileId, + profileActionInfo = Some( + ProfileActionInfo.ServerProfileReport( + ServerProfileReport(reportType = sgAction) + )) + ) + ) + case _ => + Item.ProfileInfo( + ProfileInfo( + actionProfileId = actionProfileId + ) + ) + } + + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = actionItem, + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = sourceTimestampMs, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerSocialGraphEvents + ) + ) + } + + val expectedUuaFollow: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileFollow + ) + + val expectedUuaUnfollow: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileUnfollow + ) + + val expectedUuaMute: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileMute + ) + + val expectedUuaUnmute: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileUnmute + ) + + val expectedUuaBlock: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileBlock + ) + + val expectedUuaUnblock: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileUnblock + ) + + val expectedUuaReportAsSpam: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileReport, + socialGraphAction = Some(Action.ReportAsSpam) + ) + + val expectedUuaReportAsAbuse: UnifiedUserAction = getExpectedUUA( + userId = 1111L, + actionProfileId = 2222L, + sourceTimestampMs = 1001L, + actionType = ActionType.ServerProfileReport, + socialGraphAction = Some(Action.ReportAsAbuse) + ) + } + + test("SocialGraphAdapter ignore events not in the list") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val ignoredSocialGraphEvents: TableFor1[WriteEvent] = Table( + "ignoredSocialGraphEvents", + sgAcceptUnfollowRetweetEvent, + sgAcceptFollowRequestEvent, + sgAcceptFollowRetweetEvent, + sgCreateFollowRequestEvent, + sgCancelFollowRequestEvent, + ) + forEvery(ignoredSocialGraphEvents) { writeEvent: WriteEvent => + val actual = SocialGraphAdapter.adaptEvent(writeEvent) + assert(actual.isEmpty) + } + } + } + } + + test("Test SocialGraphAdapter consuming Write events") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val socialProfileActions: TableFor3[String, WriteEvent, UnifiedUserAction] = Table( + ("actionType", "event", "expectedUnifiedUserAction"), + ("ProfileFollow", sgFollowEvent, expectedUuaFollow), + ("ProfileUnfollow", sgUnfollowEvent, expectedUuaUnfollow), + ("ProfileBlock", sgBlockEvent, expectedUuaBlock), + ("ProfileUnBlock", sgUnblockEvent, expectedUuaUnblock), + ("ProfileMute", sgMuteEvent, expectedUuaMute), + ("ProfileUnmute", sgUnmuteEvent, expectedUuaUnmute), + ("ProfileReportAsSpam", sgReportAsSpamEvent, expectedUuaReportAsSpam), + ("ProfileReportAsAbuse", sgReportAsAbuseEvent, expectedUuaReportAsAbuse), + ) + forEvery(socialProfileActions) { + (_: String, event: WriteEvent, expected: UnifiedUserAction) => + val actual = SocialGraphAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + + test("SocialGraphAdapter ignore redundant follow/unfollow events") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val socialGraphActions: TableFor3[String, WriteEvent, Seq[UnifiedUserAction]] = Table( + ("actionType", "ignoredRedundantFollowUnfollowEvents", "expectedUnifiedUserAction"), + ("ProfileFollow", sgFollowRedundantEvent, Nil), + ("ProfileFollow", sgFollowRedundantIsFalseEvent, Seq(expectedUuaFollow)), + ("ProfileUnfollow", sgUnfollowRedundantEvent, Nil), + ("ProfileUnfollow", sgUnfollowRedundantIsFalseEvent, Seq(expectedUuaUnfollow)) + ) + forEvery(socialGraphActions) { + (_: String, event: WriteEvent, expected: Seq[UnifiedUserAction]) => + val actual = SocialGraphAdapter.adaptEvent(event) + assert(expected === actual) + } + } + } + } + + test("SocialGraphAdapter ignore Unsuccessful SocialGraph events") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val unsuccessfulSocialGraphEvents: TableFor1[WriteEvent] = Table( + "ignoredSocialGraphEvents", + sgUnsuccessfulFollowEvent, + sgUnsuccessfulUnfollowEvent, + sgUnsuccessfulBlockEvent, + sgUnsuccessfulUnblockEvent, + sgUnsuccessfulMuteEvent, + sgUnsuccessfulUnmuteEvent + ) + + forEvery(unsuccessfulSocialGraphEvents) { writeEvent: WriteEvent => + val actual = SocialGraphAdapter.adaptEvent(writeEvent) + assert(actual.isEmpty) + } + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TestFixtures.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TestFixtures.scala new file mode 100644 index 000000000..b1e3c9795 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TestFixtures.scala @@ -0,0 +1,2294 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.ads.cards.thriftscala.CardEvent +import com.twitter.ads.eventstream.thriftscala.EngagementEvent +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.adserver.thriftscala.ImpressionDataNeededAtEngagementTime +import com.twitter.adserver.thriftscala.ClientInfo +import com.twitter.adserver.thriftscala.EngagementType +import com.twitter.adserver.thriftscala.DisplayLocation +import com.twitter.clientapp.thriftscala.AmplifyDetails +import com.twitter.clientapp.thriftscala.CardDetails +import com.twitter.clientapp.thriftscala.EventDetails +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.ImpressionDetails +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.clientapp.thriftscala.MediaDetails +import com.twitter.clientapp.thriftscala.MediaDetailsV2 +import com.twitter.clientapp.thriftscala.MediaType +import com.twitter.clientapp.thriftscala.NotificationDetails +import com.twitter.clientapp.thriftscala.NotificationTabDetails +import com.twitter.clientapp.thriftscala.PerformanceDetails +import com.twitter.clientapp.thriftscala.ReportDetails +import com.twitter.clientapp.thriftscala.SearchDetails +import com.twitter.clientapp.thriftscala.SuggestionDetails +import com.twitter.clientapp.thriftscala.{Item => LogEventItem} +import com.twitter.clientapp.thriftscala.{TweetDetails => LogEventTweetDetails} +import com.twitter.gizmoduck.thriftscala.UserModification +import com.twitter.gizmoduck.thriftscala.Profile +import com.twitter.gizmoduck.thriftscala.Auth +import com.twitter.gizmoduck.thriftscala.UpdateDiffItem +import com.twitter.gizmoduck.thriftscala.User +import com.twitter.gizmoduck.thriftscala.UserType +import com.twitter.ibis.thriftscala.NotificationScribe +import com.twitter.ibis.thriftscala.NotificationScribeType +import com.twitter.iesource.thriftscala.ClientEventContext +import com.twitter.iesource.thriftscala.TweetImpression +import com.twitter.iesource.thriftscala.ClientType +import com.twitter.iesource.thriftscala.ContextualEventNamespace +import com.twitter.iesource.thriftscala.EngagingContext +import com.twitter.iesource.thriftscala.EventSource +import com.twitter.iesource.thriftscala.InteractionDetails +import com.twitter.iesource.thriftscala.InteractionEvent +import com.twitter.iesource.thriftscala.InteractionType +import com.twitter.iesource.thriftscala.InteractionTargetType +import com.twitter.iesource.thriftscala.{UserIdentifier => UserIdentifierIE} +import com.twitter.logbase.thriftscala.ClientEventReceiver +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.mediaservices.commons.thriftscala.MediaCategory +import com.twitter.notificationservice.api.thriftscala.NotificationClientEventMetadata +import com.twitter.reportflow.thriftscala.ReportType +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerData +import com.twitter.suggests.controller_data.home_tweets.v1.thriftscala.{ + HomeTweetsControllerData => HomeTweetsControllerDataV1 +} +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.suggests.controller_data.timelines_topic.thriftscala.TimelinesTopicControllerData +import com.twitter.suggests.controller_data.timelines_topic.v1.thriftscala.{ + TimelinesTopicControllerData => TimelinesTopicControllerDataV1 +} +import com.twitter.suggests.controller_data.v2.thriftscala.{ControllerData => ControllerDataV2} +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import com.twitter.video.analytics.thriftscala.ClientMediaEvent +import com.twitter.video.analytics.thriftscala.SessionState +import com.twitter.video.analytics.thriftscala._ +import com.twitter.suggests.controller_data.search_response.v1.thriftscala.{ + SearchResponseControllerData => SearchResponseControllerDataV1 +} +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerData +import com.twitter.suggests.controller_data.search_response.request.thriftscala.RequestControllerData +import com.twitter.unified_user_actions.thriftscala.FeedbackPromptInfo + +object TestFixtures { + trait CommonFixture { + val frozenTime: Time = Time.fromMilliseconds(1658949273000L) + + val userId: Long = 123L + val authorId: Long = 112233L + val itemTweetId: Long = 111L + val itemProfileId: Long = 123456L + val retweetingTweetId: Long = 222L + val quotedTweetId: Long = 333L + val quotedAuthorId: Long = 456L + val inReplyToTweetId: Long = 444L + val quotingTweetId: Long = 555L + val topicId: Long = 1234L + val traceId: Long = 5678L + val requestJoinId: Long = 91011L + val notificationId: String = "12345" + val tweetIds: Seq[Long] = Seq[Long](111, 222, 333) + val reportFlowId: String = "report-flow-id" + } + + trait ClientEventFixture extends CommonFixture { + + val timestamp = 1001L + + val logBase: LogBase = LogBase( + ipAddress = "", + transactionId = "", + timestamp = 1002L, + driftAdjustedEventCreatedAtMs = Some(1001L), + userId = Some(userId), + clientEventReceiver = Some(ClientEventReceiver.CesHttp) + ) + + val logBase1: LogBase = LogBase( + ipAddress = "", + transactionId = "", + userId = Some(userId), + guestId = Some(2L), + guestIdMarketing = Some(2L), + timestamp = timestamp + ) + + def mkSearchResultControllerData( + queryOpt: Option[String], + querySourceOpt: Option[Int] = None, + traceId: Option[Long] = None, + requestJoinId: Option[Long] = None + ): ControllerData = { + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1(requestControllerData = Some( + RequestControllerData( + rawQuery = queryOpt, + querySource = querySourceOpt, + traceId = traceId, + requestJoinId = requestJoinId + ))) + ))) + } + + val videoEventElementValues: Seq[String] = + Seq[String]( + "gif_player", + "periscope_player", + "platform_amplify_card", + "video_player", + "vine_player") + + val invalidVideoEventElementValues: Seq[String] = + Seq[String]( + "dynamic_video_ads", + "live_video_player", + "platform_forward_card", + "video_app_card_canvas", + "youtube_player" + ) + + val clientMediaEvent: ClientMediaEvent = ClientMediaEvent( + sessionState = SessionState( + contentVideoIdentifier = MediaIdentifier.MediaPlatformIdentifier( + MediaPlatformIdentifier(mediaId = 123L, mediaCategory = MediaCategory.TweetVideo)), + sessionId = "", + ), + mediaClientEventType = MediaEventType.IntentToPlay(IntentToPlay()), + playingMediaState = PlayingMediaState( + videoType = VideoType.Content, + mediaAssetUrl = "", + mediaMetadata = MediaMetadata(publisherIdentifier = PublisherIdentifier + .TwitterPublisherIdentifier(TwitterPublisherIdentifier(123456L))) + ), + playerState = PlayerState(isMuted = false) + ) + + val mediaDetailsV2: MediaDetailsV2 = MediaDetailsV2( + mediaItems = Some( + Seq[MediaDetails]( + MediaDetails( + contentId = Some("456"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)), + MediaDetails( + contentId = Some("123"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)), + MediaDetails( + contentId = Some("789"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)) + )) + ) + + val cardDetails = + CardDetails(amplifyDetails = Some(AmplifyDetails(videoType = Some("content")))) + + val videoMetadata: TweetActionInfo = TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + mediaType = Some(MediaType.ConsumerVideo), + isMonetizable = Some(false), + videoType = Some("content"))) + + val notificationDetails: NotificationDetails = + NotificationDetails(impressionId = Some(notificationId)) + + val notificationTabTweetEventDetails: NotificationTabDetails = + NotificationTabDetails( + clientEventMetadata = Some( + NotificationClientEventMetadata( + tweetIds = Some(Seq[Long](itemTweetId)), + upstreamId = Some(notificationId), + requestId = "", + notificationId = "", + notificationCount = 0)) + ) + + val notificationTabMultiTweetEventDetails: NotificationTabDetails = + NotificationTabDetails( + clientEventMetadata = Some( + NotificationClientEventMetadata( + tweetIds = Some(tweetIds), + upstreamId = Some(notificationId), + requestId = "", + notificationId = "", + notificationCount = 0)) + ) + + val notificationTabUnknownEventDetails: NotificationTabDetails = + NotificationTabDetails( + clientEventMetadata = Some( + NotificationClientEventMetadata( + upstreamId = Some(notificationId), + requestId = "", + notificationId = "", + notificationCount = 0)) + ) + + val tweetNotificationContent: NotificationContent = + NotificationContent.TweetNotification(TweetNotification(itemTweetId)) + + val multiTweetNotificationContent: NotificationContent = + NotificationContent.MultiTweetNotification(MultiTweetNotification(tweetIds)) + + val unknownNotificationContent: NotificationContent = + NotificationContent.UnknownNotification(UnknownNotification()) + + val reportTweetClick: TweetActionInfo = + TweetActionInfo.ClientTweetReport(ClientTweetReport(isReportTweetDone = false)) + + val reportTweetDone: TweetActionInfo = + TweetActionInfo.ClientTweetReport(ClientTweetReport(isReportTweetDone = true)) + + val reportTweetWithReportFlowId: TweetActionInfo = + TweetActionInfo.ClientTweetReport( + ClientTweetReport(isReportTweetDone = true, reportFlowId = Some(reportFlowId))) + + val reportTweetWithoutReportFlowId: TweetActionInfo = + TweetActionInfo.ClientTweetReport( + ClientTweetReport(isReportTweetDone = true, reportFlowId = None)) + + val reportTweetSubmit: TweetActionInfo = + TweetActionInfo.ServerTweetReport( + ServerTweetReport(reportFlowId = Some(reportFlowId), reportType = Some(ReportType.Abuse))) + + val notificationTabProductSurfaceInfo: ProductSurfaceInfo = + ProductSurfaceInfo.NotificationTabInfo(NotificationTabInfo(notificationId = notificationId)) + + val clientOpenLinkWithUrl: TweetActionInfo = + TweetActionInfo.ClientTweetOpenLink(ClientTweetOpenLink(url = Some("go/url"))) + + val clientOpenLinkWithoutUrl: TweetActionInfo = + TweetActionInfo.ClientTweetOpenLink(ClientTweetOpenLink(url = None)) + + val clientTakeScreenshot: TweetActionInfo = + TweetActionInfo.ClientTweetTakeScreenshot( + ClientTweetTakeScreenshot(percentVisibleHeight100k = Some(100))) + + // client-event event_namespace + val ceLingerEventNamespace: EventNamespace = EventNamespace( + component = Some("stream"), + element = Some("linger"), + action = Some("results") + ) + val ceRenderEventNamespace: EventNamespace = EventNamespace( + component = Some("stream"), + action = Some("results") + ) + val ceTweetDetailsEventNamespace1: EventNamespace = EventNamespace( + page = Some("tweet"), + section = None, + component = Some("tweet"), + element = None, + action = Some("impression") + ) + val ceGalleryEventNamespace: EventNamespace = EventNamespace( + component = Some("gallery"), + element = Some("photo"), + action = Some("impression") + ) + val ceFavoriteEventNamespace: EventNamespace = EventNamespace(action = Some("favorite")) + val ceHomeFavoriteEventNamespace: EventNamespace = + EventNamespace(page = Some("home"), action = Some("favorite")) + val ceHomeLatestFavoriteEventNamespace: EventNamespace = + EventNamespace(page = Some("home_latest"), action = Some("favorite")) + val ceSearchFavoriteEventNamespace: EventNamespace = + EventNamespace(page = Some("search"), action = Some("favorite")) + val ceClickReplyEventNamespace: EventNamespace = EventNamespace(action = Some("reply")) + val ceReplyEventNamespace: EventNamespace = EventNamespace(action = Some("send_reply")) + val ceRetweetEventNamespace: EventNamespace = EventNamespace(action = Some("retweet")) + val ceVideoPlayback25: EventNamespace = EventNamespace(action = Some("playback_25")) + val ceVideoPlayback50: EventNamespace = EventNamespace(action = Some("playback_50")) + val ceVideoPlayback75: EventNamespace = EventNamespace(action = Some("playback_75")) + val ceVideoPlayback95: EventNamespace = EventNamespace(action = Some("playback_95")) + val ceVideoPlayFromTap: EventNamespace = EventNamespace(action = Some("play_from_tap")) + val ceVideoQualityView: EventNamespace = EventNamespace(action = Some("video_quality_view")) + val ceVideoView: EventNamespace = EventNamespace(action = Some("video_view")) + val ceVideoMrcView: EventNamespace = EventNamespace(action = Some("video_mrc_view")) + val ceVideoViewThreshold: EventNamespace = EventNamespace(action = Some("view_threshold")) + val ceVideoCtaUrlClick: EventNamespace = EventNamespace(action = Some("cta_url_click")) + val ceVideoCtaWatchClick: EventNamespace = EventNamespace(action = Some("cta_watch_click")) + val cePhotoExpand: EventNamespace = + EventNamespace(element = Some("platform_photo_card"), action = Some("click")) + val ceCardClick: EventNamespace = + EventNamespace(element = Some("platform_card"), action = Some("click")) + val ceCardOpenApp: EventNamespace = EventNamespace(action = Some("open_app")) + val ceCardAppInstallAttempt: EventNamespace = EventNamespace(action = Some("install_app")) + val cePollCardVote1: EventNamespace = + EventNamespace(element = Some("platform_card"), action = Some("vote")) + val cePollCardVote2: EventNamespace = + EventNamespace(element = Some("platform_forward_card"), action = Some("vote")) + val ceMentionClick: EventNamespace = + EventNamespace(element = Some("mention"), action = Some("click")) + val ceVideoPlaybackStart: EventNamespace = EventNamespace(action = Some("playback_start")) + val ceVideoPlaybackComplete: EventNamespace = EventNamespace(action = Some("playback_complete")) + val ceClickHashtag: EventNamespace = EventNamespace(action = Some("hashtag_click")) + val ceTopicFollow1: EventNamespace = + EventNamespace(element = Some("topic"), action = Some("follow")) + val ceOpenLink: EventNamespace = EventNamespace(action = Some("open_link")) + val ceTakeScreenshot: EventNamespace = EventNamespace(action = Some("take_screenshot")) + val ceTopicFollow2: EventNamespace = + EventNamespace(element = Some("social_proof"), action = Some("follow")) + val ceTopicFollow3: EventNamespace = + EventNamespace(element = Some("feedback_follow_topic"), action = Some("click")) + val ceTopicUnfollow1: EventNamespace = + EventNamespace(element = Some("topic"), action = Some("unfollow")) + val ceTopicUnfollow2: EventNamespace = + EventNamespace(element = Some("social_proof"), action = Some("unfollow")) + val ceTopicUnfollow3: EventNamespace = + EventNamespace(element = Some("feedback_unfollow_topic"), action = Some("click")) + val ceTopicNotInterestedIn1: EventNamespace = + EventNamespace(element = Some("topic"), action = Some("not_interested")) + val ceTopicNotInterestedIn2: EventNamespace = + EventNamespace(element = Some("feedback_not_interested_in_topic"), action = Some("click")) + val ceTopicUndoNotInterestedIn1: EventNamespace = + EventNamespace(element = Some("topic"), action = Some("un_not_interested")) + val ceTopicUndoNotInterestedIn2: EventNamespace = + EventNamespace(element = Some("feedback_not_interested_in_topic"), action = Some("undo")) + val ceProfileFollowAttempt: EventNamespace = + EventNamespace(action = Some("follow_attempt")) + val ceTweetFavoriteAttempt: EventNamespace = + EventNamespace(action = Some("favorite_attempt")) + val ceTweetRetweetAttempt: EventNamespace = + EventNamespace(action = Some("retweet_attempt")) + val ceTweetReplyAttempt: EventNamespace = + EventNamespace(action = Some("reply_attempt")) + val ceClientCTALoginClick: EventNamespace = + EventNamespace(action = Some("login")) + val ceClientCTALoginStart: EventNamespace = + EventNamespace(page = Some("login"), action = Some("show")) + val ceClientCTALoginSuccess: EventNamespace = + EventNamespace(page = Some("login"), action = Some("success")) + val ceClientCTASignupClick: EventNamespace = + EventNamespace(action = Some("signup")) + val ceClientCTASignupSuccess: EventNamespace = + EventNamespace(page = Some("signup"), action = Some("success")) + val ceNotificationOpen: EventNamespace = EventNamespace( + page = Some("notification"), + section = Some("status_bar"), + component = None, + action = Some("open")) + val ceNotificationClick: EventNamespace = EventNamespace( + page = Some("ntab"), + section = Some("all"), + component = Some("urt"), + element = Some("users_liked_your_tweet"), + action = Some("navigate")) + val ceTypeaheadClick: EventNamespace = + EventNamespace(element = Some("typeahead"), action = Some("click")) + val ceTweetReport: EventNamespace = EventNamespace(element = Some("report_tweet")) + def ceEventNamespace(element: String, action: String): EventNamespace = + EventNamespace(element = Some(element), action = Some(action)) + def ceTweetReportFlow(page: String, action: String): EventNamespace = + EventNamespace(element = Some("ticket"), page = Some(page), action = Some(action)) + val ceNotificationSeeLessOften: EventNamespace = EventNamespace( + page = Some("ntab"), + section = Some("all"), + component = Some("urt"), + action = Some("see_less_often")) + val ceNotificationDismiss: EventNamespace = EventNamespace( + page = Some("notification"), + section = Some("status_bar"), + component = None, + action = Some("dismiss")) + val ceSearchResultsRelevant: EventNamespace = EventNamespace( + page = Some("search"), + component = Some("did_you_find_it_module"), + element = Some("is_relevant"), + action = Some("click") + ) + val ceSearchResultsNotRelevant: EventNamespace = EventNamespace( + page = Some("search"), + component = Some("did_you_find_it_module"), + element = Some("not_relevant"), + action = Some("click") + ) + val ceTweetRelevantToSearch: EventNamespace = EventNamespace( + page = Some("search"), + component = Some("relevance_prompt_module"), + element = Some("is_relevant"), + action = Some("click")) + val ceTweetNotRelevantToSearch: EventNamespace = EventNamespace( + page = Some("search"), + component = Some("relevance_prompt_module"), + element = Some("not_relevant"), + action = Some("click")) + val ceProfileBlock: EventNamespace = + EventNamespace(page = Some("profile"), action = Some("block")) + val ceProfileUnblock: EventNamespace = + EventNamespace(page = Some("profile"), action = Some("unblock")) + val ceProfileMute: EventNamespace = + EventNamespace(page = Some("profile"), action = Some("mute_user")) + val ceProfileReport: EventNamespace = + EventNamespace(page = Some("profile"), action = Some("report")) + val ceProfileShow: EventNamespace = + EventNamespace(page = Some("profile"), action = Some("show")) + val ceProfileFollow: EventNamespace = + EventNamespace(action = Some("follow")) + val ceProfileClick: EventNamespace = + EventNamespace(action = Some("profile_click")) + val ceTweetFollowAuthor1: EventNamespace = EventNamespace( + action = Some("click"), + element = Some("follow") + ) + val ceTweetFollowAuthor2: EventNamespace = EventNamespace( + action = Some("follow") + ) + val ceTweetUnfollowAuthor1: EventNamespace = EventNamespace( + action = Some("click"), + element = Some("unfollow") + ) + val ceTweetUnfollowAuthor2: EventNamespace = EventNamespace( + action = Some("unfollow") + ) + val ceTweetBlockAuthor: EventNamespace = EventNamespace( + page = Some("profile"), + section = Some("tweets"), + component = Some("tweet"), + action = Some("click"), + element = Some("block") + ) + val ceTweetUnblockAuthor: EventNamespace = EventNamespace( + section = Some("tweets"), + component = Some("tweet"), + action = Some("click"), + element = Some("unblock") + ) + val ceTweetMuteAuthor: EventNamespace = EventNamespace( + component = Some("suggest_sc_tweet"), + action = Some("click"), + element = Some("mute") + ) + val ceTweetClick: EventNamespace = + EventNamespace(element = Some("tweet"), action = Some("click")) + val ceTweetClickProfile: EventNamespace = EventNamespace( + component = Some("tweet"), + element = Some("user"), + action = Some("profile_click")) + val ceAppExit: EventNamespace = + EventNamespace(page = Some("app"), action = Some("become_inactive")) + + // UUA client_event_namespace + val uuaLingerClientEventNamespace: ClientEventNamespace = ClientEventNamespace( + component = Some("stream"), + element = Some("linger"), + action = Some("results") + ) + val uuaRenderClientEventNamespace: ClientEventNamespace = ClientEventNamespace( + component = Some("stream"), + action = Some("results") + ) + val ceTweetDetailsClientEventNamespace1: ClientEventNamespace = ClientEventNamespace( + page = Some("tweet"), + section = None, + component = Some("tweet"), + element = None, + action = Some("impression") + ) + val ceTweetDetailsClientEventNamespace2: ClientEventNamespace = ClientEventNamespace( + page = Some("tweet"), + section = None, + component = Some("suggest_ranked_list_tweet"), + element = None, + action = Some("impression") + ) + val ceTweetDetailsClientEventNamespace3: ClientEventNamespace = ClientEventNamespace( + page = Some("tweet"), + section = None, + component = None, + element = None, + action = Some("impression") + ) + val ceTweetDetailsClientEventNamespace4: ClientEventNamespace = ClientEventNamespace( + page = Some("tweet"), + section = None, + component = None, + element = None, + action = Some("show") + ) + val ceTweetDetailsClientEventNamespace5: ClientEventNamespace = ClientEventNamespace( + page = Some("tweet"), + section = Some("landing"), + component = None, + element = None, + action = Some("show") + ) + val ceGalleryClientEventNamespace: ClientEventNamespace = ClientEventNamespace( + component = Some("gallery"), + element = Some("photo"), + action = Some("impression") + ) + val uuaFavoriteClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("favorite")) + val uuaHomeFavoriteClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(page = Some("home"), action = Some("favorite")) + val uuaSearchFavoriteClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(page = Some("search"), action = Some("favorite")) + val uuaHomeLatestFavoriteClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(page = Some("home_latest"), action = Some("favorite")) + val uuaClickReplyClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("reply")) + val uuaReplyClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("send_reply")) + val uuaRetweetClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("retweet")) + val uuaVideoPlayback25ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_25")) + val uuaVideoPlayback50ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_50")) + val uuaVideoPlayback75ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_75")) + val uuaVideoPlayback95ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_95")) + val uuaOpenLinkClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("open_link")) + val uuaTakeScreenshotClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("take_screenshot")) + val uuaVideoPlayFromTapClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("play_from_tap")) + val uuaVideoQualityViewClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("video_quality_view")) + val uuaVideoViewClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("video_view")) + val uuaVideoMrcViewClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("video_mrc_view")) + val uuaVideoViewThresholdClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("view_threshold")) + val uuaVideoCtaUrlClickClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("cta_url_click")) + val uuaVideoCtaWatchClickClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("cta_watch_click")) + val uuaPhotoExpandClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(element = Some("platform_photo_card"), action = Some("click")) + val uuaCardClickClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(element = Some("platform_card"), action = Some("click")) + val uuaCardOpenAppClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("open_app")) + val uuaCardAppInstallAttemptClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("install_app")) + val uuaPollCardVote1ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(element = Some("platform_card"), action = Some("vote")) + val uuaPollCardVote2ClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(element = Some("platform_forward_card"), action = Some("vote")) + val uuaMentionClickClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(element = Some("mention"), action = Some("click")) + val uuaVideoPlaybackStartClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_start")) + val uuaVideoPlaybackCompleteClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("playback_complete")) + val uuaClickHashtagClientEventNamespace: ClientEventNamespace = + ClientEventNamespace(action = Some("hashtag_click")) + val uuaTopicFollowClientEventNamespace1: ClientEventNamespace = + ClientEventNamespace(element = Some("topic"), action = Some("follow")) + val uuaTopicFollowClientEventNamespace2: ClientEventNamespace = + ClientEventNamespace(element = Some("social_proof"), action = Some("follow")) + val uuaTopicFollowClientEventNamespace3: ClientEventNamespace = + ClientEventNamespace(element = Some("feedback_follow_topic"), action = Some("click")) + val uuaTopicUnfollowClientEventNamespace1: ClientEventNamespace = + ClientEventNamespace(element = Some("topic"), action = Some("unfollow")) + val uuaTopicUnfollowClientEventNamespace2: ClientEventNamespace = + ClientEventNamespace(element = Some("social_proof"), action = Some("unfollow")) + val uuaTopicUnfollowClientEventNamespace3: ClientEventNamespace = + ClientEventNamespace(element = Some("feedback_unfollow_topic"), action = Some("click")) + val uuaTopicNotInterestedInClientEventNamespace1: ClientEventNamespace = + ClientEventNamespace(element = Some("topic"), action = Some("not_interested")) + val uuaTopicNotInterestedInClientEventNamespace2: ClientEventNamespace = + ClientEventNamespace( + element = Some("feedback_not_interested_in_topic"), + action = Some("click")) + val uuaTopicUndoNotInterestedInClientEventNamespace1: ClientEventNamespace = + ClientEventNamespace(element = Some("topic"), action = Some("un_not_interested")) + val uuaTopicUndoNotInterestedInClientEventNamespace2: ClientEventNamespace = + ClientEventNamespace( + element = Some("feedback_not_interested_in_topic"), + action = Some("undo")) + val uuaProfileFollowAttempt: ClientEventNamespace = + ClientEventNamespace(action = Some("follow_attempt")) + val uuaTweetFavoriteAttempt: ClientEventNamespace = + ClientEventNamespace(action = Some("favorite_attempt")) + val uuaTweetRetweetAttempt: ClientEventNamespace = + ClientEventNamespace(action = Some("retweet_attempt")) + val uuaTweetReplyAttempt: ClientEventNamespace = + ClientEventNamespace(action = Some("reply_attempt")) + val uuaClientCTALoginClick: ClientEventNamespace = + ClientEventNamespace(action = Some("login")) + val uuaClientCTALoginStart: ClientEventNamespace = + ClientEventNamespace(page = Some("login"), action = Some("show")) + val uuaClientCTALoginSuccess: ClientEventNamespace = + ClientEventNamespace(page = Some("login"), action = Some("success")) + val uuaClientCTASignupClick: ClientEventNamespace = + ClientEventNamespace(action = Some("signup")) + val uuaClientCTASignupSuccess: ClientEventNamespace = + ClientEventNamespace(page = Some("signup"), action = Some("success")) + val uuaNotificationOpen: ClientEventNamespace = + ClientEventNamespace( + page = Some("notification"), + section = Some("status_bar"), + component = None, + action = Some("open")) + val uuaNotificationClick: ClientEventNamespace = + ClientEventNamespace( + page = Some("ntab"), + section = Some("all"), + component = Some("urt"), + element = Some("users_liked_your_tweet"), + action = Some("navigate")) + val uuaTweetReport: ClientEventNamespace = ClientEventNamespace(element = Some("report_tweet")) + val uuaTweetFollowAuthor1: ClientEventNamespace = + ClientEventNamespace(element = Some("follow"), action = Some("click")) + val uuaTweetFollowAuthor2: ClientEventNamespace = + ClientEventNamespace(action = Some("follow")) + val uuaTweetUnfollowAuthor1: ClientEventNamespace = + ClientEventNamespace(element = Some("unfollow"), action = Some("click")) + val uuaTweetUnfollowAuthor2: ClientEventNamespace = + ClientEventNamespace(action = Some("unfollow")) + val uuaNotificationSeeLessOften: ClientEventNamespace = ClientEventNamespace( + page = Some("ntab"), + section = Some("all"), + component = Some("urt"), + action = Some("see_less_often")) + def uuaClientEventNamespace(element: String, action: String): ClientEventNamespace = + ClientEventNamespace(element = Some(element), action = Some(action)) + def uuaTweetReportFlow(page: String, action: String): ClientEventNamespace = + ClientEventNamespace(element = Some("ticket"), page = Some(page), action = Some(action)) + val uuaTweetClick: ClientEventNamespace = + ClientEventNamespace(element = Some("tweet"), action = Some("click")) + def uuaTweetClickProfile: ClientEventNamespace = ClientEventNamespace( + component = Some("tweet"), + element = Some("user"), + action = Some("profile_click")) + val uuaNotificationDismiss: ClientEventNamespace = ClientEventNamespace( + page = Some("notification"), + section = Some("status_bar"), + component = None, + action = Some("dismiss")) + val uuaTypeaheadClick: ClientEventNamespace = + ClientEventNamespace(element = Some("typeahead"), action = Some("click")) + val uuaSearchResultsRelevant: ClientEventNamespace = ClientEventNamespace( + page = Some("search"), + component = Some("did_you_find_it_module"), + element = Some("is_relevant"), + action = Some("click") + ) + val uuaSearchResultsNotRelevant: ClientEventNamespace = ClientEventNamespace( + page = Some("search"), + component = Some("did_you_find_it_module"), + element = Some("not_relevant"), + action = Some("click") + ) + val uuaTweetRelevantToSearch: ClientEventNamespace = ClientEventNamespace( + page = Some("search"), + component = Some("relevance_prompt_module"), + element = Some("is_relevant"), + action = Some("click")) + val uuaTweetNotRelevantToSearch: ClientEventNamespace = ClientEventNamespace( + page = Some("search"), + component = Some("relevance_prompt_module"), + element = Some("not_relevant"), + action = Some("click")) + val uuaProfileBlock: ClientEventNamespace = + ClientEventNamespace(page = Some("profile"), action = Some("block")) + val uuaProfileUnblock: ClientEventNamespace = + ClientEventNamespace(page = Some("profile"), action = Some("unblock")) + val uuaProfileMute: ClientEventNamespace = + ClientEventNamespace(page = Some("profile"), action = Some("mute_user")) + val uuaProfileReport: ClientEventNamespace = + ClientEventNamespace(page = Some("profile"), action = Some("report")) + val uuaProfileShow: ClientEventNamespace = + ClientEventNamespace(page = Some("profile"), action = Some("show")) + val uuaProfileFollow: ClientEventNamespace = + ClientEventNamespace(action = Some("follow")) + val uuaProfileClick: ClientEventNamespace = + ClientEventNamespace(action = Some("profile_click")) + val uuaTweetBlockAuthor: ClientEventNamespace = ClientEventNamespace( + page = Some("profile"), + section = Some("tweets"), + component = Some("tweet"), + action = Some("click"), + element = Some("block") + ) + val uuaTweetUnblockAuthor: ClientEventNamespace = ClientEventNamespace( + section = Some("tweets"), + component = Some("tweet"), + action = Some("click"), + element = Some("unblock") + ) + val uuaTweetMuteAuthor: ClientEventNamespace = ClientEventNamespace( + component = Some("suggest_sc_tweet"), + action = Some("click"), + element = Some("mute") + ) + val uuaAppExit: ClientEventNamespace = + ClientEventNamespace(page = Some("app"), action = Some("become_inactive")) + + // helper methods for creating client-events and UUA objects + def mkLogEvent( + eventName: String = "", + eventNamespace: Option[EventNamespace], + eventDetails: Option[EventDetails] = None, + logBase: Option[LogBase] = None, + pushNotificationDetails: Option[NotificationDetails] = None, + reportDetails: Option[ReportDetails] = None, + searchDetails: Option[SearchDetails] = None, + performanceDetails: Option[PerformanceDetails] = None + ): LogEvent = LogEvent( + eventName = eventName, + eventNamespace = eventNamespace, + eventDetails = eventDetails, + logBase = logBase, + notificationDetails = pushNotificationDetails, + reportDetails = reportDetails, + searchDetails = searchDetails, + performanceDetails = performanceDetails + ) + + def actionTowardDefaultTweetEvent( + eventNamespace: Option[EventNamespace], + impressionDetails: Option[ImpressionDetails] = None, + suggestionDetails: Option[SuggestionDetails] = None, + itemId: Option[Long] = Some(itemTweetId), + mediaDetailsV2: Option[MediaDetailsV2] = None, + clientMediaEvent: Option[ClientMediaEvent] = None, + itemTypeOpt: Option[ItemType] = Some(ItemType.Tweet), + authorId: Option[Long] = None, + isFollowedByActingUser: Option[Boolean] = None, + isFollowingActingUser: Option[Boolean] = None, + notificationTabDetails: Option[NotificationTabDetails] = None, + reportDetails: Option[ReportDetails] = None, + logBase: LogBase = logBase, + tweetPosition: Option[Int] = None, + promotedId: Option[String] = None, + url: Option[String] = None, + targets: Option[Seq[LogEventItem]] = None, + percentVisibleHeight100k: Option[Int] = None, + searchDetails: Option[SearchDetails] = None, + cardDetails: Option[CardDetails] = None + ): LogEvent = + mkLogEvent( + eventName = "action_toward_default_tweet_event", + eventNamespace = eventNamespace, + reportDetails = reportDetails, + eventDetails = Some( + EventDetails( + url = url, + items = Some( + Seq(LogEventItem( + id = itemId, + percentVisibleHeight100k = percentVisibleHeight100k, + itemType = itemTypeOpt, + impressionDetails = impressionDetails, + suggestionDetails = suggestionDetails, + mediaDetailsV2 = mediaDetailsV2, + clientMediaEvent = clientMediaEvent, + cardDetails = cardDetails, + tweetDetails = authorId.map { id => LogEventTweetDetails(authorId = Some(id)) }, + isViewerFollowsTweetAuthor = isFollowedByActingUser, + isTweetAuthorFollowsViewer = isFollowingActingUser, + notificationTabDetails = notificationTabDetails, + position = tweetPosition, + promotedId = promotedId + ))), + targets = targets + ) + ), + logBase = Some(logBase), + searchDetails = searchDetails + ) + + def actionTowardReplyEvent( + eventNamespace: Option[EventNamespace], + inReplyToTweetId: Long = inReplyToTweetId, + impressionDetails: Option[ImpressionDetails] = None + ): LogEvent = + mkLogEvent( + eventName = "action_toward_reply_event", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails( + items = Some( + Seq( + LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + impressionDetails = impressionDetails, + tweetDetails = + Some(LogEventTweetDetails(inReplyToTweetId = Some(inReplyToTweetId))) + )) + ) + ) + ), + logBase = Some(logBase) + ) + + def actionTowardRetweetEvent( + eventNamespace: Option[EventNamespace], + inReplyToTweetId: Option[Long] = None, + impressionDetails: Option[ImpressionDetails] = None + ): LogEvent = + mkLogEvent( + eventName = "action_toward_retweet_event", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails( + items = Some( + Seq(LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + impressionDetails = impressionDetails, + tweetDetails = Some(LogEventTweetDetails( + retweetingTweetId = Some(retweetingTweetId), + inReplyToTweetId = inReplyToTweetId)) + ))) + ) + ), + logBase = Some(logBase) + ) + + def actionTowardQuoteEvent( + eventNamespace: Option[EventNamespace], + inReplyToTweetId: Option[Long] = None, + quotedAuthorId: Option[Long] = None, + impressionDetails: Option[ImpressionDetails] = None + ): LogEvent = + mkLogEvent( + eventName = "action_toward_quote_event", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails( + items = Some( + Seq( + LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + impressionDetails = impressionDetails, + tweetDetails = Some( + LogEventTweetDetails( + quotedTweetId = Some(quotedTweetId), + inReplyToTweetId = inReplyToTweetId, + quotedAuthorId = quotedAuthorId)) + )) + ) + ) + ), + logBase = Some(logBase) + ) + + def actionTowardRetweetEventWithReplyAndQuote( + eventNamespace: Option[EventNamespace], + inReplyToTweetId: Long = inReplyToTweetId, + impressionDetails: Option[ImpressionDetails] = None + ): LogEvent = mkLogEvent( + eventName = "action_toward_retweet_event_with_reply_and_quote", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails( + items = Some( + Seq(LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + impressionDetails = impressionDetails, + tweetDetails = Some( + LogEventTweetDetails( + retweetingTweetId = Some(retweetingTweetId), + quotedTweetId = Some(quotedTweetId), + inReplyToTweetId = Some(inReplyToTweetId), + )) + ))) + ) + ), + logBase = Some(logBase) + ) + + def pushNotificationEvent( + eventNamespace: Option[EventNamespace], + itemId: Option[Long] = Some(itemTweetId), + itemTypeOpt: Option[ItemType] = Some(ItemType.Tweet), + notificationDetails: Option[NotificationDetails], + ): LogEvent = + mkLogEvent( + eventName = "push_notification_open", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails( + items = Some( + Seq( + LogEventItem( + id = itemId, + itemType = itemTypeOpt, + )))) + ), + logBase = Some(logBase), + pushNotificationDetails = notificationDetails + ) + + def actionTowardNotificationEvent( + eventNamespace: Option[EventNamespace], + notificationTabDetails: Option[NotificationTabDetails], + ): LogEvent = + mkLogEvent( + eventName = "notification_event", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails(items = + Some(Seq(LogEventItem(notificationTabDetails = notificationTabDetails))))), + logBase = Some(logBase) + ) + + def profileClickEvent(eventNamespace: Option[EventNamespace]): LogEvent = + mkLogEvent( + eventName = "profile_click", + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails(items = Some(Seq( + LogEventItem(id = Some(userId), itemType = Some(ItemType.User)), + LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + tweetDetails = Some(LogEventTweetDetails(authorId = Some(authorId)))) + )))), + logBase = Some(logBase) + ) + + def actionTowardProfileEvent( + eventName: String, + eventNamespace: Option[EventNamespace] + ): LogEvent = + mkLogEvent( + eventName = eventName, + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails(items = Some( + Seq( + LogEventItem(id = Some(itemProfileId), itemType = Some(ItemType.User)) + )))), + logBase = Some(logBase) + ) + + def tweetActionTowardAuthorEvent( + eventName: String, + eventNamespace: Option[EventNamespace] + ): LogEvent = + mkLogEvent( + eventName = eventName, + eventNamespace = eventNamespace, + eventDetails = Some( + EventDetails(items = Some(Seq( + LogEventItem(id = Some(userId), itemType = Some(ItemType.User)), + LogEventItem( + id = Some(itemTweetId), + itemType = Some(ItemType.Tweet), + tweetDetails = Some(LogEventTweetDetails(authorId = Some(authorId)))) + )))), + logBase = Some(logBase) + ) + + def actionTowardsTypeaheadEvent( + eventNamespace: Option[EventNamespace], + targets: Option[Seq[LogEventItem]], + searchQuery: String + ): LogEvent = + mkLogEvent( + eventNamespace = eventNamespace, + eventDetails = Some(EventDetails(targets = targets)), + logBase = Some(logBase), + searchDetails = Some(SearchDetails(query = Some(searchQuery))) + ) + def actionTowardSearchResultPageEvent( + eventNamespace: Option[EventNamespace], + searchDetails: Option[SearchDetails], + items: Option[Seq[LogEventItem]] = None + ): LogEvent = + mkLogEvent( + eventNamespace = eventNamespace, + eventDetails = Some(EventDetails(items = items)), + logBase = Some(logBase), + searchDetails = searchDetails + ) + + def actionTowardsUasEvent( + eventNamespace: Option[EventNamespace], + clientAppId: Option[Long], + duration: Option[Long] + ): LogEvent = + mkLogEvent( + eventNamespace = eventNamespace, + logBase = Some(logBase.copy(clientAppId = clientAppId)), + performanceDetails = Some(PerformanceDetails(durationMs = duration)) + ) + + def mkUUAEventMetadata( + clientEventNamespace: Option[ClientEventNamespace], + traceId: Option[Long] = None, + requestJoinId: Option[Long] = None, + clientAppId: Option[Long] = None + ): EventMetadata = EventMetadata( + sourceTimestampMs = 1001L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ClientEvents, + clientEventNamespace = clientEventNamespace, + traceId = traceId, + requestJoinId = requestJoinId, + clientAppId = clientAppId + ) + + def mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Option[Long] = None, + tweetActionInfo: Option[TweetActionInfo] = None, + topicId: Option[Long] = None, + authorInfo: Option[AuthorInfo] = None, + productSurface: Option[ProductSurface] = None, + productSurfaceInfo: Option[ProductSurfaceInfo] = None, + tweetPosition: Option[Int] = None, + promotedId: Option[String] = None, + traceIdOpt: Option[Long] = None, + requestJoinIdOpt: Option[Long] = None, + guestIdMarketingOpt: Option[Long] = None + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = + UserIdentifier(userId = Some(userId), guestIdMarketing = guestIdMarketingOpt), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + inReplyToTweetId = inReplyToTweetId, + tweetActionInfo = tweetActionInfo, + actionTweetTopicSocialProofId = topicId, + actionTweetAuthorInfo = authorInfo, + tweetPosition = tweetPosition, + promotedId = promotedId + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata( + clientEventNamespace = clientEventNamespace, + traceId = traceIdOpt, + requestJoinId = requestJoinIdOpt + ), + productSurface = productSurface, + productSurfaceInfo = productSurfaceInfo + ) + + def mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + inReplyToTweetId = Some(inReplyToTweetId), + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Option[Long] = None, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + retweetingTweetId = Some(retweetingTweetId), + inReplyToTweetId = inReplyToTweetId, + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Option[Long] = None, + quotedAuthorId: Option[Long] = None, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + quotedTweetId = Some(quotedTweetId), + quotedAuthorId = quotedAuthorId, + inReplyToTweetId = inReplyToTweetId, + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardQuotingEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Option[Long] = None, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = quotedTweetId, + quotingTweetId = Some(itemTweetId), + inReplyToTweetId = inReplyToTweetId, + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Long = inReplyToTweetId, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + retweetingTweetId = Some(retweetingTweetId), + quotedTweetId = Some(quotedTweetId), + inReplyToTweetId = Some(inReplyToTweetId), + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoting( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + inReplyToTweetId: Long = inReplyToTweetId, + tweetActionInfo: Option[TweetActionInfo] = None, + authorInfo: Option[AuthorInfo] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = quotedTweetId, + quotingTweetId = Some(itemTweetId), + tweetActionInfo = tweetActionInfo, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForActionTowardTopicEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + topicId: Long, + traceId: Option[Long] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TopicInfo( + TopicInfo( + actionTopicId = topicId, + ) + ), + actionType = actionType, + eventMetadata = + mkUUAEventMetadata(clientEventNamespace = clientEventNamespace, traceId = traceId) + ) + + def mkExpectedUUAForNotificationEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + notificationContent: NotificationContent, + productSurface: Option[ProductSurface], + productSurfaceInfo: Option[ProductSurfaceInfo], + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.NotificationInfo( + NotificationInfo( + actionNotificationId = notificationId, + content = notificationContent + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace), + productSurface = productSurface, + productSurfaceInfo = productSurfaceInfo + ) + + def mkExpectedUUAForProfileClick( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + authorInfo: Option[AuthorInfo] = None + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + actionTweetAuthorInfo = authorInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForTweetActionTowardAuthor( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + authorInfo: Option[AuthorInfo] = None, + tweetActionInfo: Option[TweetActionInfo] = None + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + actionTweetAuthorInfo = authorInfo, + tweetActionInfo = tweetActionInfo + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForProfileAction( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + actionProfileId: Long + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.ProfileInfo( + ProfileInfo( + actionProfileId = actionProfileId + ) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForTypeaheadAction( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + typeaheadActionInfo: TypeaheadActionInfo, + searchQuery: String, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TypeaheadInfo( + TypeaheadInfo(actionQuery = searchQuery, typeaheadActionInfo = typeaheadActionInfo) + ), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace), + productSurface = Some(ProductSurface.SearchTypeahead), + productSurfaceInfo = + Some(ProductSurfaceInfo.SearchTypeaheadInfo(SearchTypeaheadInfo(query = searchQuery))) + ) + def mkExpectedUUAForFeedbackSubmitAction( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + feedbackPromptInfo: FeedbackPromptInfo, + searchQuery: String + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.FeedbackPromptInfo(feedbackPromptInfo), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace), + productSurface = Some(ProductSurface.SearchResultsPage), + productSurfaceInfo = + Some(ProductSurfaceInfo.SearchResultsPageInfo(SearchResultsPageInfo(query = searchQuery))) + ) + + def mkExpectedUUAForActionTowardCTAEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + guestIdMarketingOpt: Option[Long] + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = + UserIdentifier(userId = Some(userId), guestIdMarketing = guestIdMarketingOpt), + item = Item.CtaInfo(CTAInfo()), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def mkExpectedUUAForUasEvent( + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + clientAppId: Option[Long], + duration: Option[Long] + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.UasInfo(UASInfo(timeSpentMs = duration.get)), + actionType = actionType, + eventMetadata = + mkUUAEventMetadata(clientEventNamespace = clientEventNamespace, clientAppId = clientAppId) + ) + + def mkExpectedUUAForCardEvent( + id: Option[Long], + clientEventNamespace: Option[ClientEventNamespace], + actionType: ActionType, + itemType: Option[ItemType], + authorId: Option[Long], + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.CardInfo( + CardInfo( + id = id, + itemType = itemType, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = authorId)))), + actionType = actionType, + eventMetadata = mkUUAEventMetadata(clientEventNamespace = clientEventNamespace) + ) + + def timelineTopicControllerData(topicId: Long = topicId): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.TimelinesTopic( + TimelinesTopicControllerData.V1( + TimelinesTopicControllerDataV1( + topicId = topicId, + topicTypesBitmap = 1 + ) + ))) + + def homeTweetControllerData( + topicId: Long = topicId, + traceId: Long = traceId + ): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1( + HomeTweetsControllerDataV1( + topicId = Some(topicId), + traceId = Some(traceId) + )))) + + def homeTweetControllerDataV2( + injectedPosition: Option[Int] = None, + requestJoinId: Option[Long] = None, + traceId: Option[Long] = None + ): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1( + HomeTweetsControllerDataV1( + injectedPosition = injectedPosition, + traceId = traceId, + requestJoinId = requestJoinId + )))) + + // mock client-events + val ddgEvent: LogEvent = mkLogEvent( + eventName = "ddg", + eventNamespace = Some( + EventNamespace( + page = Some("ddg"), + action = Some("experiment") + ) + ) + ) + + val qigRankerEvent: LogEvent = mkLogEvent( + eventName = "qig_ranker", + eventNamespace = Some( + EventNamespace( + page = Some("qig_ranker"), + ) + ) + ) + + val timelineMixerEvent: LogEvent = mkLogEvent( + eventName = "timelinemixer", + eventNamespace = Some( + EventNamespace( + page = Some("timelinemixer"), + ) + ) + ) + + val timelineServiceEvent: LogEvent = mkLogEvent( + eventName = "timelineservice", + eventNamespace = Some( + EventNamespace( + page = Some("timelineservice"), + ) + ) + ) + + val tweetConcServiceEvent: LogEvent = mkLogEvent( + eventName = "tweetconvosvc", + eventNamespace = Some( + EventNamespace( + page = Some("tweetconvosvc"), + ) + ) + ) + + val renderNonTweetItemTypeEvent: LogEvent = mkLogEvent( + eventName = "render non-tweet item-type", + eventNamespace = Some(ceRenderEventNamespace), + eventDetails = Some( + EventDetails( + items = Some( + Seq(LogEventItem(itemType = Some(ItemType.Event))) + ) + ) + ) + ) + + val renderDefaultTweetWithTopicIdEvent: LogEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceRenderEventNamespace), + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(timelineTopicControllerData()))) + ) + + def renderDefaultTweetUserFollowStatusEvent( + authorId: Option[Long], + isFollowedByActingUser: Boolean = false, + isFollowingActingUser: Boolean = false + ): LogEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceRenderEventNamespace), + authorId = authorId, + isFollowedByActingUser = Some(isFollowedByActingUser), + isFollowingActingUser = Some(isFollowingActingUser) + ) + + val lingerDefaultTweetEvent: LogEvent = actionTowardDefaultTweetEvent( + eventNamespace = Some(ceLingerEventNamespace), + impressionDetails = Some( + ImpressionDetails( + visibilityStart = Some(100L), + visibilityEnd = Some(105L) + )) + ) + + val lingerReplyEvent: LogEvent = actionTowardReplyEvent( + eventNamespace = Some(ceLingerEventNamespace), + impressionDetails = Some( + ImpressionDetails( + visibilityStart = Some(100L), + visibilityEnd = Some(105L) + )) + ) + + val lingerRetweetEvent: LogEvent = actionTowardRetweetEvent( + eventNamespace = Some(ceLingerEventNamespace), + impressionDetails = Some( + ImpressionDetails( + visibilityStart = Some(100L), + visibilityEnd = Some(105L) + )) + ) + + val lingerQuoteEvent: LogEvent = actionTowardQuoteEvent( + eventNamespace = Some(ceLingerEventNamespace), + impressionDetails = Some( + ImpressionDetails( + visibilityStart = Some(100L), + visibilityEnd = Some(105L) + )) + ) + + val lingerRetweetWithReplyAndQuoteEvent: LogEvent = actionTowardRetweetEventWithReplyAndQuote( + eventNamespace = Some(ceLingerEventNamespace), + impressionDetails = Some( + ImpressionDetails( + visibilityStart = Some(100L), + visibilityEnd = Some(105L) + )) + ) + + val replyToDefaultTweetOrReplyEvent: LogEvent = actionTowardReplyEvent( + eventNamespace = Some(ceReplyEventNamespace), + // since the action is reply, item.id = inReplyToTweetId + inReplyToTweetId = itemTweetId, + ) + + val replyToRetweetEvent: LogEvent = actionTowardRetweetEvent( + eventNamespace = Some(ceReplyEventNamespace), + // since the action is reply, item.id = inReplyToTweetId + inReplyToTweetId = Some(itemTweetId), + ) + + val replyToQuoteEvent: LogEvent = actionTowardQuoteEvent( + eventNamespace = Some(ceReplyEventNamespace), + // since the action is reply, item.id = inReplyToTweetId + inReplyToTweetId = Some(itemTweetId), + ) + + val replyToRetweetWithReplyAndQuoteEvent: LogEvent = actionTowardRetweetEventWithReplyAndQuote( + eventNamespace = Some(ceReplyEventNamespace), + // since the action is reply, item.id = inReplyToTweetId + inReplyToTweetId = itemTweetId, + ) + + // expected UUA corresponding to mock client-events + val expectedTweetRenderDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression + ) + + val expectedTweetRenderReplyUUA: UnifiedUserAction = mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression + ) + + val expectedTweetRenderRetweetUUA: UnifiedUserAction = mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression + ) + + val expectedTweetRenderQuoteUUA1: UnifiedUserAction = mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression, + quotedAuthorId = Some(quotedAuthorId), + ) + val expectedTweetRenderQuoteUUA2: UnifiedUserAction = mkExpectedUUAForActionTowardQuotingEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression, + authorInfo = Some(AuthorInfo(authorId = Some(quotedAuthorId))) + ) + + val expectedTweetRenderRetweetWithReplyAndQuoteUUA1: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression + ) + val expectedTweetRenderRetweetWithReplyAndQuoteUUA2: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoting( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression + ) + + val expectedTweetRenderDefaultTweetWithTopicIdUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression, + topicId = Some(topicId) + ) + + val expectedTweetDetailImpressionUUA1: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(ceTweetDetailsClientEventNamespace1), + actionType = ActionType.ClientTweetDetailsImpression + ) + + val expectedTweetGalleryImpressionUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(ceGalleryClientEventNamespace), + actionType = ActionType.ClientTweetGalleryImpression + ) + + def expectedTweetRenderDefaultTweetWithAuthorInfoUUA( + authorInfo: Option[AuthorInfo] = None + ): UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaRenderClientEventNamespace), + actionType = ActionType.ClientTweetRenderImpression, + authorInfo = authorInfo + ) + + val expectedTweetLingerDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaLingerClientEventNamespace), + actionType = ActionType.ClientTweetLingerImpression, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = 100L, + lingerEndTimestampMs = 105L + )) + ) + ) + + val expectedTweetLingerReplyUUA: UnifiedUserAction = mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace = Some(uuaLingerClientEventNamespace), + actionType = ActionType.ClientTweetLingerImpression, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = 100L, + lingerEndTimestampMs = 105L + )) + ) + ) + + val expectedTweetLingerRetweetUUA: UnifiedUserAction = mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaLingerClientEventNamespace), + actionType = ActionType.ClientTweetLingerImpression, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = 100L, + lingerEndTimestampMs = 105L + )) + ) + ) + + val expectedTweetLingerQuoteUUA: UnifiedUserAction = mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaLingerClientEventNamespace), + actionType = ActionType.ClientTweetLingerImpression, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = 100L, + lingerEndTimestampMs = 105L + )) + ) + ) + + val expectedTweetLingerRetweetWithReplyAndQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaLingerClientEventNamespace), + actionType = ActionType.ClientTweetLingerImpression, + tweetActionInfo = Some( + TweetActionInfo.ClientTweetLingerImpression( + ClientTweetLingerImpression( + lingerStartTimestampMs = 100L, + lingerEndTimestampMs = 105L + )) + ) + ) + + val expectedTweetClickQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some( + ClientEventNamespace( + action = Some("quote") + )), + actionType = ActionType.ClientTweetClickQuote + ) + + def expectedTweetQuoteUUA(action: String): UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some( + ClientEventNamespace( + action = Some(action) + )), + actionType = ActionType.ClientTweetQuote + ) + + val expectedTweetFavoriteDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav + ) + + val expectedHomeTweetEventWithControllerDataSuggestType: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = Some( + ProductSurfaceInfo.HomeTimelineInfo( + HomeTimelineInfo(suggestionType = Some("Test_type"), injectedPosition = Some(1)))), + traceIdOpt = Some(traceId), + requestJoinIdOpt = Some(requestJoinId) + ) + + val expectedHomeTweetEventWithControllerData: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = + Some(ProductSurfaceInfo.HomeTimelineInfo(HomeTimelineInfo(injectedPosition = Some(1)))), + traceIdOpt = Some(traceId), + requestJoinIdOpt = Some(requestJoinId) + ) + + val expectedSearchTweetEventWithControllerData: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaSearchFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.SearchResultsPage), + productSurfaceInfo = + Some(ProductSurfaceInfo.SearchResultsPageInfo(SearchResultsPageInfo(query = "twitter"))), + traceIdOpt = Some(traceId), + requestJoinIdOpt = Some(requestJoinId) + ) + + val expectedHomeTweetEventWithSuggestType: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = Some( + ProductSurfaceInfo.HomeTimelineInfo(HomeTimelineInfo(suggestionType = Some("Test_type")))) + ) + + val expectedHomeLatestTweetEventWithControllerDataSuggestType: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeLatestFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = Some( + ProductSurfaceInfo.HomeTimelineInfo( + HomeTimelineInfo(suggestionType = Some("Test_type"), injectedPosition = Some(1)))), + traceIdOpt = Some(traceId), + requestJoinIdOpt = Some(requestJoinId) + ) + + val expectedHomeLatestTweetEventWithControllerData: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeLatestFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = + Some(ProductSurfaceInfo.HomeTimelineInfo(HomeTimelineInfo(injectedPosition = Some(1)))), + traceIdOpt = Some(traceId), + requestJoinIdOpt = Some(requestJoinId) + ) + + val expectedHomeLatestTweetEventWithSuggestType: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaHomeLatestFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav, + productSurface = Some(ProductSurface.HomeTimeline), + productSurfaceInfo = Some( + ProductSurfaceInfo.HomeTimelineInfo(HomeTimelineInfo(suggestionType = Some("Test_type")))) + ) + + val expectedTweetFavoriteReplyUUA: UnifiedUserAction = mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace = Some(uuaFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav + ) + + val expectedTweetFavoriteRetweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav + ) + + val expectedTweetFavoriteQuoteUUA: UnifiedUserAction = mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav) + + val expectedTweetFavoriteRetweetWithReplyAndQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaFavoriteClientEventNamespace), + actionType = ActionType.ClientTweetFav + ) + + val expectedTweetClickReplyDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaClickReplyClientEventNamespace), + actionType = ActionType.ClientTweetClickReply + ) + + val expectedTweetClickReplyReplyUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace = Some(uuaClickReplyClientEventNamespace), + actionType = ActionType.ClientTweetClickReply + ) + + val expectedTweetClickReplyRetweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaClickReplyClientEventNamespace), + actionType = ActionType.ClientTweetClickReply + ) + + val expectedTweetClickReplyQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaClickReplyClientEventNamespace), + actionType = ActionType.ClientTweetClickReply + ) + + val expectedTweetClickReplyRetweetWithReplyAndQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaClickReplyClientEventNamespace), + actionType = ActionType.ClientTweetClickReply + ) + + val expectedTweetReplyDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaReplyClientEventNamespace), + actionType = ActionType.ClientTweetReply, + inReplyToTweetId = Some(itemTweetId) + ) + + val expectedTweetReplyRetweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaReplyClientEventNamespace), + actionType = ActionType.ClientTweetReply, + inReplyToTweetId = Some(itemTweetId) + ) + + val expectedTweetReplyQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaReplyClientEventNamespace), + actionType = ActionType.ClientTweetReply, + inReplyToTweetId = Some(itemTweetId) + ) + + val expectedTweetReplyRetweetWithReplyAndQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaReplyClientEventNamespace), + actionType = ActionType.ClientTweetReply, + inReplyToTweetId = itemTweetId + ) + + val expectedTweetRetweetDefaultTweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardDefaultTweetEvent( + clientEventNamespace = Some(uuaRetweetClientEventNamespace), + actionType = ActionType.ClientTweetRetweet + ) + + val expectedTweetRetweetReplyUUA: UnifiedUserAction = mkExpectedUUAForActionTowardReplyEvent( + clientEventNamespace = Some(uuaRetweetClientEventNamespace), + actionType = ActionType.ClientTweetRetweet + ) + + val expectedTweetRetweetRetweetUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEvent( + clientEventNamespace = Some(uuaRetweetClientEventNamespace), + actionType = ActionType.ClientTweetRetweet + ) + + val expectedTweetRetweetQuoteUUA: UnifiedUserAction = mkExpectedUUAForActionTowardQuoteEvent( + clientEventNamespace = Some(uuaRetweetClientEventNamespace), + actionType = ActionType.ClientTweetRetweet + ) + + val expectedTweetRetweetRetweetWithReplyAndQuoteUUA: UnifiedUserAction = + mkExpectedUUAForActionTowardRetweetEventWithReplyAndQuoted( + clientEventNamespace = Some(uuaRetweetClientEventNamespace), + actionType = ActionType.ClientTweetRetweet + ) + } + + trait EmailNotificationEventFixture extends CommonFixture { + val timestamp = 1001L + val pageUrlStatus = + "https://twitter.com/a/status/3?cn=a%3D%3D&refsrc=email" + val tweetIdStatus = 3L + + val pageUrlEvent = + "https://twitter.com/i/events/2?cn=a%3D%3D&refsrc=email" + val tweetIdEvent = 2L + + val pageUrlNoArgs = "https://twitter.com/i/events/1" + val tweetIdNoArgs = 1L + + val logBase1: LogBase = LogBase( + transactionId = "test", + ipAddress = "127.0.0.1", + userId = Some(userId), + guestId = Some(2L), + timestamp = timestamp, + page = Some(pageUrlStatus), + ) + + val logBase2: LogBase = LogBase( + transactionId = "test", + ipAddress = "127.0.0.1", + userId = Some(userId), + guestId = Some(2L), + timestamp = timestamp + ) + + val notificationEvent: NotificationScribe = NotificationScribe( + `type` = NotificationScribeType.Click, + impressionId = Some("1234"), + userId = Some(userId), + timestamp = timestamp, + logBase = Some(logBase1) + ) + + val notificationEventWOTweetId: NotificationScribe = NotificationScribe( + `type` = NotificationScribeType.Click, + impressionId = Some("1234"), + userId = Some(userId), + timestamp = timestamp, + logBase = Some(logBase2) + ) + + val notificationEventWOImpressionId: NotificationScribe = NotificationScribe( + `type` = NotificationScribeType.Click, + userId = Some(userId), + timestamp = timestamp, + logBase = Some(logBase1) + ) + + val expectedUua: UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = tweetIdStatus, + ) + ), + actionType = ActionType.ClientTweetEmailClick, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.EmailNotificationEvents, + traceId = None + ), + productSurfaceInfo = Some( + ProductSurfaceInfo.EmailNotificationInfo(EmailNotificationInfo(notificationId = "1234"))), + productSurface = Some(ProductSurface.EmailNotification) + ) + } + + trait UserModificationEventFixture extends CommonFixture { + val timestamp = 1001L + val userName = "A" + val screenName = "B" + val description = "this is A" + val location = "US" + val url = s"https://www.twitter.com/${userName}" + + val baseUserModification = UserModification( + forUserId = Some(userId), + userId = Some(userId), + ) + + val userCreate = baseUserModification.copy( + create = Some( + User( + id = userId, + createdAtMsec = timestamp, + updatedAtMsec = timestamp, + userType = UserType.Normal, + profile = Some( + Profile( + name = userName, + screenName = screenName, + description = description, + auth = null.asInstanceOf[Auth], + location = location, + url = url + )) + )), + ) + + val updateDiffs = Seq( + UpdateDiffItem(fieldName = "user_name", before = Some("abc"), after = Some("def")), + UpdateDiffItem(fieldName = "description", before = Some("d1"), after = Some("d2")), + ) + val userUpdate = baseUserModification.copy( + updatedAtMsec = Some(timestamp), + update = Some(updateDiffs), + success = Some(true) + ) + + val expectedUuaUserCreate: UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.ProfileInfo( + ProfileInfo( + actionProfileId = userId, + name = Some(userName), + handle = Some(screenName), + description = Some(description) + ) + ), + actionType = ActionType.ServerUserCreate, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerGizmoduckUserModificationEvents, + ) + ) + + val expectedUuaUserUpdate: UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.ProfileInfo( + ProfileInfo( + actionProfileId = userId, + profileActionInfo = Some( + ProfileActionInfo.ServerUserUpdate( + ServerUserUpdate(updates = updateDiffs, success = Some(true)))) + ) + ), + actionType = ActionType.ServerUserUpdate, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerGizmoduckUserModificationEvents, + ) + ) + } + + trait AdsCallbackEngagementsFixture extends CommonFixture { + + val timestamp = 1001L + val engagementId = 123 + val accountTimeZone = "PST" + val advertiserId = 2002L + val displayLocation: DisplayLocation = DisplayLocation(value = 1) + val trendId = 1002 + + val authorInfo: AuthorInfo = AuthorInfo(authorId = Some(advertiserId)) + val openLinkWithUrl: TweetActionInfo = + TweetActionInfo.ServerPromotedTweetOpenLink(ServerPromotedTweetOpenLink(url = Some("go/url"))) + val openLinkWithoutUrl: TweetActionInfo = + TweetActionInfo.ServerPromotedTweetOpenLink(ServerPromotedTweetOpenLink(url = None)) + + def createTweetInfoItem( + authorInfo: Option[AuthorInfo] = None, + actionInfo: Option[TweetActionInfo] = None + ): Item = { + Item.TweetInfo( + TweetInfo( + actionTweetId = itemTweetId, + actionTweetAuthorInfo = authorInfo, + tweetActionInfo = actionInfo)) + } + + val trendInfoItem: Item = Item.TrendInfo(TrendInfo(actionTrendId = trendId)) + + val organicTweetId = Some(100001L) + val promotedTweetId = Some(200002L) + + val organicTweetVideoUuid = Some("organic_video_1") + val organicTweetVideoOwnerId = Some(123L) + + val promotedTweetVideoUuid = Some("promoted_video_1") + val promotedTweetVideoOwnerId = Some(345L) + + val prerollAdUuid = Some("preroll_ad_1") + val prerollAdOwnerId = Some(567L) + + val amplifyDetailsPrerollAd = Some( + AmplifyDetails( + videoOwnerId = prerollAdOwnerId, + videoUuid = prerollAdUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + )) + + val tweetActionInfoPrerollAd = Some( + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + isMonetizable = Some(true), + videoOwnerId = prerollAdOwnerId, + videoUuid = prerollAdUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + ) + ) + ) + + val amplifyDetailsPromotedTweetWithoutAd = Some( + AmplifyDetails( + videoOwnerId = promotedTweetVideoOwnerId, + videoUuid = promotedTweetVideoUuid + )) + + val tweetActionInfoPromotedTweetWithoutAd = Some( + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + isMonetizable = Some(true), + videoOwnerId = promotedTweetVideoOwnerId, + videoUuid = promotedTweetVideoUuid, + ) + ) + ) + + val amplifyDetailsPromotedTweetWithAd = Some( + AmplifyDetails( + videoOwnerId = promotedTweetVideoOwnerId, + videoUuid = promotedTweetVideoUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + )) + + val tweetActionInfoPromotedTweetWithAd = Some( + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + isMonetizable = Some(true), + videoOwnerId = promotedTweetVideoOwnerId, + videoUuid = promotedTweetVideoUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + ) + ) + ) + + val amplifyDetailsOrganicTweetWithAd = Some( + AmplifyDetails( + videoOwnerId = organicTweetVideoOwnerId, + videoUuid = organicTweetVideoUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + )) + + val tweetActionInfoOrganicTweetWithAd = Some( + TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + isMonetizable = Some(true), + videoOwnerId = organicTweetVideoOwnerId, + videoUuid = organicTweetVideoUuid, + prerollOwnerId = prerollAdOwnerId, + prerollUuid = prerollAdUuid + ) + ) + ) + + def createExpectedUua( + actionType: ActionType, + item: Item + ): UnifiedUserAction = { + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = item, + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerAdsCallbackEngagements + ) + ) + } + + def createExpectedUuaWithProfileInfo( + actionType: ActionType + ): UnifiedUserAction = { + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.ProfileInfo(ProfileInfo(actionProfileId = advertiserId)), + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerAdsCallbackEngagements + ) + ) + } + + def createSpendServerEvent( + engagementType: EngagementType, + url: Option[String] = None + ): SpendServerEvent = { + SpendServerEvent( + engagementEvent = Some( + EngagementEvent( + clientInfo = Some(ClientInfo(userId64 = Some(userId))), + engagementId = engagementId, + engagementEpochTimeMilliSec = timestamp, + engagementType = engagementType, + accountTimeZone = accountTimeZone, + url = url, + impressionData = Some( + ImpressionDataNeededAtEngagementTime( + advertiserId = advertiserId, + promotedTweetId = Some(itemTweetId), + displayLocation = displayLocation, + promotedTrendId = Some(trendId))) + ))) + } + + def createExpectedVideoUua( + actionType: ActionType, + tweetActionInfo: Option[TweetActionInfo], + actionTweetId: Option[Long] + ): UnifiedUserAction = { + UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = actionTweetId.get, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(advertiserId))), + tweetActionInfo = tweetActionInfo + ) + ), + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerAdsCallbackEngagements + ) + ) + } + + def createVideoSpendServerEvent( + engagementType: EngagementType, + amplifyDetails: Option[AmplifyDetails], + promotedTweetId: Option[Long], + organicTweetId: Option[Long] + ): SpendServerEvent = { + SpendServerEvent( + engagementEvent = Some( + EngagementEvent( + clientInfo = Some(ClientInfo(userId64 = Some(userId))), + engagementId = engagementId, + engagementEpochTimeMilliSec = timestamp, + engagementType = engagementType, + accountTimeZone = accountTimeZone, + impressionData = Some( + ImpressionDataNeededAtEngagementTime( + advertiserId = advertiserId, + promotedTweetId = promotedTweetId, + displayLocation = displayLocation, + organicTweetId = organicTweetId)), + cardEngagement = Some( + CardEvent( + amplifyDetails = amplifyDetails + ) + ) + ))) + } + } + + trait InteractionEventsFixtures extends CommonFixture { + val timestamp = 123456L + val tweetId = 1L + val engagingUserId = 11L + + val baseInteractionEvent: InteractionEvent = InteractionEvent( + targetId = tweetId, + targetType = InteractionTargetType.Tweet, + engagingUserId = engagingUserId, + eventSource = EventSource.ClientEvent, + timestampMillis = timestamp, + interactionType = Some(InteractionType.TweetRenderImpression), + details = InteractionDetails.TweetRenderImpression(TweetImpression()), + additionalEngagingUserIdentifiers = UserIdentifierIE(), + engagingContext = EngagingContext.ClientEventContext( + ClientEventContext( + clientEventNamespace = ContextualEventNamespace(), + clientType = ClientType.Iphone, + displayLocation = DisplayLocation(1), + isTweetDetailsImpression = Some(false))) + ) + + val loggedOutInteractionEvent: InteractionEvent = baseInteractionEvent.copy(engagingUserId = 0L) + + val detailImpressionInteractionEvent: InteractionEvent = baseInteractionEvent.copy( + engagingContext = EngagingContext.ClientEventContext( + ClientEventContext( + clientEventNamespace = ContextualEventNamespace(), + clientType = ClientType.Iphone, + displayLocation = DisplayLocation(1), + isTweetDetailsImpression = Some(true))) + ) + + val expectedBaseKeyedUuaTweet: KeyedUuaTweet = KeyedUuaTweet( + tweetId = tweetId, + actionType = ActionType.ClientTweetRenderImpression, + userIdentifier = UserIdentifier(userId = Some(engagingUserId)), + eventMetadata = EventMetadata( + sourceTimestampMs = timestamp, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ClientEvents + ) + ) + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TlsFavsAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TlsFavsAdapterSpec.scala new file mode 100644 index 000000000..a627cac95 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TlsFavsAdapterSpec.scala @@ -0,0 +1,205 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.context.thriftscala.Viewer +import com.twitter.inject.Test +import com.twitter.timelineservice.thriftscala._ +import com.twitter.unified_user_actions.adapter.tls_favs_event.TlsFavsAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time + +class TlsFavsAdapterSpec extends Test { + trait Fixture { + + val frozenTime = Time.fromMilliseconds(1658949273000L) + + val favEventNoRetweet = ContextualizedFavoriteEvent( + event = FavoriteEventUnion.Favorite( + FavoriteEvent( + userId = 91L, + tweetId = 1L, + tweetUserId = 101L, + eventTimeMs = 1001L + ) + ), + context = LogEventContext(hostname = "", traceId = 31L) + ) + val favEventRetweet = ContextualizedFavoriteEvent( + event = FavoriteEventUnion.Favorite( + FavoriteEvent( + userId = 92L, + tweetId = 2L, + tweetUserId = 102L, + eventTimeMs = 1002L, + retweetId = Some(22L) + ) + ), + context = LogEventContext(hostname = "", traceId = 32L) + ) + val unfavEventNoRetweet = ContextualizedFavoriteEvent( + event = FavoriteEventUnion.Unfavorite( + UnfavoriteEvent( + userId = 93L, + tweetId = 3L, + tweetUserId = 103L, + eventTimeMs = 1003L + ) + ), + context = LogEventContext(hostname = "", traceId = 33L) + ) + val unfavEventRetweet = ContextualizedFavoriteEvent( + event = FavoriteEventUnion.Unfavorite( + UnfavoriteEvent( + userId = 94L, + tweetId = 4L, + tweetUserId = 104L, + eventTimeMs = 1004L, + retweetId = Some(44L) + ) + ), + context = LogEventContext(hostname = "", traceId = 34L) + ) + val favEventWithLangAndCountry = ContextualizedFavoriteEvent( + event = FavoriteEventUnion.Favorite( + FavoriteEvent( + userId = 91L, + tweetId = 1L, + tweetUserId = 101L, + eventTimeMs = 1001L, + viewerContext = + Some(Viewer(requestCountryCode = Some("us"), requestLanguageCode = Some("en"))) + ) + ), + context = LogEventContext(hostname = "", traceId = 31L) + ) + + val expectedUua1 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(91L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 1L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(101L))), + ) + ), + actionType = ActionType.ServerTweetFav, + eventMetadata = EventMetadata( + sourceTimestampMs = 1001L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + traceId = Some(31L) + ) + ) + val expectedUua2 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(92L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 2L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(102L))), + retweetingTweetId = Some(22L) + ) + ), + actionType = ActionType.ServerTweetFav, + eventMetadata = EventMetadata( + sourceTimestampMs = 1002L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + traceId = Some(32L) + ) + ) + val expectedUua3 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(93L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 3L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(103L))), + ) + ), + actionType = ActionType.ServerTweetUnfav, + eventMetadata = EventMetadata( + sourceTimestampMs = 1003L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + traceId = Some(33L) + ) + ) + val expectedUua4 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(94L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 4L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(104L))), + retweetingTweetId = Some(44L) + ) + ), + actionType = ActionType.ServerTweetUnfav, + eventMetadata = EventMetadata( + sourceTimestampMs = 1004L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + traceId = Some(34L) + ) + ) + val expectedUua5 = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(91L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 1L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(101L))), + ) + ), + actionType = ActionType.ServerTweetFav, + eventMetadata = EventMetadata( + sourceTimestampMs = 1001L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + language = Some("EN"), + countryCode = Some("US"), + traceId = Some(31L) + ) + ) + } + + test("fav event with no retweet") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TlsFavsAdapter.adaptEvent(favEventNoRetweet) + assert(Seq(expectedUua1) === actual) + } + } + } + + test("fav event with a retweet") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TlsFavsAdapter.adaptEvent(favEventRetweet) + assert(Seq(expectedUua2) === actual) + } + } + } + + test("unfav event with no retweet") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TlsFavsAdapter.adaptEvent(unfavEventNoRetweet) + assert(Seq(expectedUua3) === actual) + } + } + } + + test("unfav event with a retweet") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TlsFavsAdapter.adaptEvent(unfavEventRetweet) + assert(Seq(expectedUua4) === actual) + } + } + } + + test("fav event with language and country") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TlsFavsAdapter.adaptEvent(favEventWithLangAndCountry) + assert(Seq(expectedUua5) === actual) + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TopicsIdUtilsSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TopicsIdUtilsSpec.scala new file mode 100644 index 000000000..3ad5a9ed5 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TopicsIdUtilsSpec.scala @@ -0,0 +1,545 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.clientapp.thriftscala._ +import com.twitter.clientapp.thriftscala.SuggestionDetails +import com.twitter.guide.scribing.thriftscala._ +import com.twitter.guide.scribing.thriftscala.{SemanticCoreInterest => SemanticCoreInterestV1} +import com.twitter.guide.scribing.thriftscala.{SimClusterInterest => SimClusterInterestV1} +import com.twitter.guide.scribing.thriftscala.TopicModuleMetadata.SemanticCoreInterest +import com.twitter.guide.scribing.thriftscala.TopicModuleMetadata.SimClusterInterest +import com.twitter.guide.scribing.thriftscala.TransparentGuideDetails.TopicMetadata +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.scrooge.TFieldBlob +import com.twitter.suggests.controller_data.home_hitl_topic_annotation_prompt.thriftscala.HomeHitlTopicAnnotationPromptControllerData +import com.twitter.suggests.controller_data.home_hitl_topic_annotation_prompt.v1.thriftscala.{ + HomeHitlTopicAnnotationPromptControllerData => HomeHitlTopicAnnotationPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_topic_annotation_prompt.thriftscala.HomeTopicAnnotationPromptControllerData +import com.twitter.suggests.controller_data.home_topic_annotation_prompt.v1.thriftscala.{ + HomeTopicAnnotationPromptControllerData => HomeTopicAnnotationPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_topic_follow_prompt.thriftscala.HomeTopicFollowPromptControllerData +import com.twitter.suggests.controller_data.home_topic_follow_prompt.v1.thriftscala.{ + HomeTopicFollowPromptControllerData => HomeTopicFollowPromptControllerDataV1 +} +import com.twitter.suggests.controller_data.home_tweets.thriftscala.HomeTweetsControllerData +import com.twitter.suggests.controller_data.home_tweets.v1.thriftscala.{ + HomeTweetsControllerData => HomeTweetsControllerDataV1 +} +import com.twitter.suggests.controller_data.search_response.item_types.thriftscala.ItemTypesControllerData +import com.twitter.suggests.controller_data.search_response.thriftscala.SearchResponseControllerData +import com.twitter.suggests.controller_data.search_response.topic_follow_prompt.thriftscala.SearchTopicFollowPromptControllerData +import com.twitter.suggests.controller_data.search_response.tweet_types.thriftscala.TweetTypesControllerData +import com.twitter.suggests.controller_data.search_response.v1.thriftscala.{ + SearchResponseControllerData => SearchResponseControllerDataV1 +} +import com.twitter.suggests.controller_data.thriftscala.ControllerData +import com.twitter.suggests.controller_data.timelines_topic.thriftscala.TimelinesTopicControllerData +import com.twitter.suggests.controller_data.timelines_topic.v1.thriftscala.{ + TimelinesTopicControllerData => TimelinesTopicControllerDataV1 +} +import com.twitter.suggests.controller_data.v2.thriftscala.{ControllerData => ControllerDataV2} +import org.apache.thrift.protocol.TField +import org.junit.runner.RunWith +import org.scalatest.funsuite.AnyFunSuite +import org.scalatest.matchers.should.Matchers +import org.scalatestplus.junit.JUnitRunner +import com.twitter.util.mock.Mockito +import org.mockito.Mockito.when +import org.scalatest.prop.TableDrivenPropertyChecks + +@RunWith(classOf[JUnitRunner]) +class TopicsIdUtilsSpec + extends AnyFunSuite + with Matchers + with Mockito + with TableDrivenPropertyChecks { + import com.twitter.unified_user_actions.adapter.client_event.TopicIdUtils._ + + trait Fixture { + def buildLogBase(userId: Long): LogBase = { + val logBase = mock[LogBase] + when(logBase.country).thenReturn(Some("US")) + when(logBase.userId).thenReturn(Some(userId)) + when(logBase.timestamp).thenReturn(100L) + when(logBase.guestId).thenReturn(Some(1L)) + when(logBase.userAgent).thenReturn(None) + when(logBase.language).thenReturn(Some("en")) + logBase + } + + def buildItemForTimeline( + itemId: Long, + itemType: ItemType, + topicId: Long, + fn: Long => ControllerData.V2 + ): Item = { + val item = Item( + id = Some(itemId), + itemType = Some(itemType), + suggestionDetails = Some(SuggestionDetails(decodedControllerData = Some(fn(topicId)))) + ) + item + } + + def buildClientEventForHomeSearchTimeline( + itemId: Long, + itemType: ItemType, + topicId: Long, + fn: Long => ControllerData.V2, + userId: Long = 1L, + eventNamespaceOpt: Option[EventNamespace] = None, + ): LogEvent = { + val logEvent = mock[LogEvent] + when(logEvent.eventNamespace).thenReturn(eventNamespaceOpt) + val eventsDetails = mock[EventDetails] + when(eventsDetails.items) + .thenReturn(Some(Seq(buildItemForTimeline(itemId, itemType, topicId, fn)))) + val logbase = buildLogBase(userId) + when(logEvent.logBase).thenReturn(Some(logbase)) + when(logEvent.eventDetails).thenReturn(Some(eventsDetails)) + logEvent + } + + def buildClientEventForHomeTweetsTimeline( + itemId: Long, + itemType: ItemType, + topicId: Long, + topicIds: Set[Long], + fn: (Long, Set[Long]) => ControllerData.V2, + userId: Long = 1L, + eventNamespaceOpt: Option[EventNamespace] = None, + ): LogEvent = { + val logEvent = mock[LogEvent] + when(logEvent.eventNamespace).thenReturn(eventNamespaceOpt) + val eventsDetails = mock[EventDetails] + when(eventsDetails.items) + .thenReturn(Some(Seq(buildItemForHomeTimeline(itemId, itemType, topicId, topicIds, fn)))) + val logbase = buildLogBase(userId) + when(logEvent.logBase).thenReturn(Some(logbase)) + when(logEvent.eventDetails).thenReturn(Some(eventsDetails)) + logEvent + } + + def buildClientEventForGuide( + itemId: Long, + itemType: ItemType, + topicId: Long, + fn: Long => TopicMetadata, + userId: Long = 1L, + eventNamespaceOpt: Option[EventNamespace] = None, + ): LogEvent = { + val logEvent = mock[LogEvent] + when(logEvent.eventNamespace).thenReturn(eventNamespaceOpt) + val logbase = buildLogBase(userId) + when(logEvent.logBase).thenReturn(Some(logbase)) + val eventDetails = mock[EventDetails] + val item = buildItemForGuide(itemId, itemType, topicId, fn) + when(eventDetails.items).thenReturn(Some(Seq(item))) + when(logEvent.eventDetails).thenReturn(Some(eventDetails)) + logEvent + } + + def buildClientEventForOnboarding( + itemId: Long, + topicId: Long, + userId: Long = 1L + ): LogEvent = { + val logEvent = mock[LogEvent] + val logbase = buildLogBase(userId) + when(logEvent.logBase).thenReturn(Some(logbase)) + when(logEvent.eventNamespace).thenReturn(Some(buildNamespaceForOnboarding)) + val eventDetails = mock[EventDetails] + val item = buildItemForOnboarding(itemId, topicId) + when(eventDetails.items) + .thenReturn(Some(Seq(item))) + when(logEvent.eventDetails).thenReturn(Some(eventDetails)) + logEvent + } + + def buildClientEventForOnboardingBackend( + topicId: Long, + userId: Long = 1L + ): LogEvent = { + val logEvent = mock[LogEvent] + val logbase = buildLogBase(userId) + when(logEvent.logBase).thenReturn(Some(logbase)) + when(logEvent.eventNamespace).thenReturn(Some(buildNamespaceForOnboardingBackend)) + val eventDetails = buildEventDetailsForOnboardingBackend(topicId) + when(logEvent.eventDetails).thenReturn(Some(eventDetails)) + logEvent + } + + def defaultNamespace: EventNamespace = { + EventNamespace(Some("iphone"), None, None, None, None, Some("favorite")) + } + + def buildNamespaceForOnboardingBackend: EventNamespace = { + EventNamespace( + Some("iphone"), + Some("onboarding_backend"), + Some("subtasks"), + Some("topics_selector"), + Some("removed"), + Some("selected")) + } + + def buildNamespaceForOnboarding: EventNamespace = { + EventNamespace( + Some("iphone"), + Some("onboarding"), + Some("topics_selector"), + None, + Some("topic"), + Some("follow") + ) + } + + def buildItemForHomeTimeline( + itemId: Long, + itemType: ItemType, + topicId: Long, + topicIds: Set[Long], + fn: (Long, Set[Long]) => ControllerData.V2 + ): Item = { + val item = Item( + id = Some(itemId), + itemType = Some(itemType), + suggestionDetails = + Some(SuggestionDetails(decodedControllerData = Some(fn(topicId, topicIds)))) + ) + item + } + + def buildItemForGuide( + itemId: Long, + itemType: ItemType, + topicId: Long, + fn: Long => TopicMetadata + ): Item = { + val item = mock[Item] + when(item.id).thenReturn(Some(itemId)) + when(item.itemType).thenReturn(Some(itemType)) + when(item.suggestionDetails) + .thenReturn(Some(SuggestionDetails(suggestionType = Some("ErgTweet")))) + val guideItemDetails = mock[GuideItemDetails] + when(guideItemDetails.transparentGuideDetails).thenReturn(Some(fn(topicId))) + when(item.guideItemDetails).thenReturn(Some(guideItemDetails)) + item + } + + def buildItemForOnboarding( + itemId: Long, + topicId: Long + ): Item = { + val item = Item( + id = Some(itemId), + itemType = None, + description = Some(s"id=$topicId,row=1") + ) + item + } + + def buildEventDetailsForOnboardingBackend( + topicId: Long + ): EventDetails = { + val eventDetails = mock[EventDetails] + val item = Item( + id = Some(topicId) + ) + val itemTmp = buildItemForOnboarding(10, topicId) + when(eventDetails.items).thenReturn(Some(Seq(itemTmp))) + when(eventDetails.targets).thenReturn(Some(Seq(item))) + eventDetails + } + + def topicMetadataInGuide(topicId: Long): TopicMetadata = + TopicMetadata( + SemanticCoreInterest( + SemanticCoreInterestV1(domainId = "131", entityId = topicId.toString) + ) + ) + + def simClusterMetadataInGuide(simclusterId: Long = 1L): TopicMetadata = + TopicMetadata( + SimClusterInterest( + SimClusterInterestV1(simclusterId.toString) + ) + ) + + def timelineTopicControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.TimelinesTopic( + TimelinesTopicControllerData.V1( + TimelinesTopicControllerDataV1( + topicId = topicId, + topicTypesBitmap = 1 + ) + ))) + + def homeTweetControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1( + HomeTweetsControllerDataV1( + topicId = Some(topicId) + )))) + + def homeTopicFollowPromptControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeTopicFollowPrompt(HomeTopicFollowPromptControllerData.V1( + HomeTopicFollowPromptControllerDataV1(Some(topicId))))) + + def homeTopicAnnotationPromptControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeTopicAnnotationPrompt(HomeTopicAnnotationPromptControllerData.V1( + HomeTopicAnnotationPromptControllerDataV1(tweetId = 1L, topicId = topicId)))) + + def homeHitlTopicAnnotationPromptControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.HomeHitlTopicAnnotationPrompt( + HomeHitlTopicAnnotationPromptControllerData.V1( + HomeHitlTopicAnnotationPromptControllerDataV1(tweetId = 2L, topicId = topicId)))) + + def searchTopicFollowPromptControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1( + Some(ItemTypesControllerData.TopicFollowControllerData( + SearchTopicFollowPromptControllerData(Some(topicId)) + )), + None + )))) + + def searchTweetTypesControllerData(topicId: Long): ControllerData.V2 = + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1( + Some(ItemTypesControllerData.TweetTypesControllerData( + TweetTypesControllerData(None, Some(topicId)) + )), + None + ) + ))) + + //used for creating logged out user client events + def buildLogBaseWithoutUserId(guestId: Long): LogBase = + LogBase( + ipAddress = "120.10.10.20", + guestId = Some(guestId), + userAgent = None, + transactionId = "", + country = Some("US"), + timestamp = 100L, + language = Some("en") + ) + } + + test("getTopicId should correctly find topic id from item for home timeline and search") { + new Fixture { + + val testData = Table( + ("ItemType", "topicId", "controllerData"), + (ItemType.Tweet, 1L, timelineTopicControllerData(1L)), + (ItemType.User, 2L, timelineTopicControllerData(2L)), + (ItemType.Topic, 3L, homeTweetControllerData(3L)), + (ItemType.Topic, 4L, homeTopicFollowPromptControllerData(4L)), + (ItemType.Topic, 5L, searchTopicFollowPromptControllerData(5L)), + (ItemType.Topic, 6L, homeHitlTopicAnnotationPromptControllerData(6L)) + ) + + forEvery(testData) { + (itemType: ItemType, topicId: Long, controllerDataV2: ControllerData.V2) => + getTopicId( + buildItemForTimeline(1, itemType, topicId, _ => controllerDataV2), + defaultNamespace) shouldEqual Some(topicId) + } + } + } + + test("getTopicId should correctly find topic id from item for guide events") { + new Fixture { + getTopicId( + buildItemForGuide(1, ItemType.Tweet, 100, topicMetadataInGuide), + defaultNamespace + ) shouldEqual Some(100) + } + } + + test("getTopicId should correctly find topic id for onboarding events") { + new Fixture { + getTopicId( + buildItemForOnboarding(1, 100), + buildNamespaceForOnboarding + ) shouldEqual Some(100) + } + } + + test("should return TopicId From HomeSearch") { + val testData = Table( + ("controllerData", "topicId"), + ( + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1(HomeTweetsControllerDataV1(topicId = Some(1L)))) + ), + Some(1L)), + ( + ControllerData.V2( + ControllerDataV2.HomeTopicFollowPrompt(HomeTopicFollowPromptControllerData + .V1(HomeTopicFollowPromptControllerDataV1(topicId = Some(2L))))), + Some(2L)), + ( + ControllerData.V2( + ControllerDataV2.TimelinesTopic( + TimelinesTopicControllerData.V1( + TimelinesTopicControllerDataV1(topicId = 3L, topicTypesBitmap = 100) + ))), + Some(3L)), + ( + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1(SearchResponseControllerDataV1(itemTypesControllerData = + Some(ItemTypesControllerData.TopicFollowControllerData( + SearchTopicFollowPromptControllerData(topicId = Some(4L)))))))), + Some(4L)), + ( + ControllerData.V2( + ControllerDataV2.SearchResponse( + SearchResponseControllerData.V1( + SearchResponseControllerDataV1(itemTypesControllerData = Some(ItemTypesControllerData + .TweetTypesControllerData(TweetTypesControllerData(topicId = Some(5L)))))))), + Some(5L)), + ( + ControllerData.V2( + ControllerDataV2 + .SearchResponse(SearchResponseControllerData.V1(SearchResponseControllerDataV1()))), + None) + ) + + forEvery(testData) { (controllerDataV2: ControllerData.V2, topicId: Option[Long]) => + getTopicIdFromHomeSearch( + Item(suggestionDetails = Some( + SuggestionDetails(decodedControllerData = Some(controllerDataV2))))) shouldEqual topicId + } + } + + test("test TopicId From Onboarding") { + val testData = Table( + ("Item", "EventNamespace", "topicId"), + ( + Item(description = Some("id=11,key=value")), + EventNamespace( + page = Some("onboarding"), + section = Some("section has topic"), + component = Some("component has topic"), + element = Some("element has topic") + ), + Some(11L)), + ( + Item(description = Some("id=22,key=value")), + EventNamespace( + page = Some("onboarding"), + section = Some("section has topic") + ), + Some(22L)), + ( + Item(description = Some("id=33,key=value")), + EventNamespace( + page = Some("onboarding"), + component = Some("component has topic") + ), + Some(33L)), + ( + Item(description = Some("id=44,key=value")), + EventNamespace( + page = Some("onboarding"), + element = Some("element has topic") + ), + Some(44L)), + ( + Item(description = Some("id=678,key=value")), + EventNamespace( + page = Some("onXYZboarding"), + section = Some("section has topic"), + component = Some("component has topic"), + element = Some("element has topic") + ), + None), + ( + Item(description = Some("id=678,key=value")), + EventNamespace( + page = Some("page has onboarding"), + section = Some("section has topPic"), + component = Some("component has topPic"), + element = Some("element has topPic") + ), + None), + ( + Item(description = Some("key=value,id=678")), + EventNamespace( + page = Some("page has onboarding"), + section = Some("section has topic"), + component = Some("component has topic"), + element = Some("element has topic") + ), + None) + ) + + forEvery(testData) { (item: Item, eventNamespace: EventNamespace, topicId: Option[Long]) => + getTopicFromOnboarding(item, eventNamespace) shouldEqual topicId + } + } + + test("test from Guide") { + val testData = Table( + ("guideItemDetails", "topicId"), + ( + GuideItemDetails(transparentGuideDetails = Some( + TransparentGuideDetails.TopicMetadata( + TopicModuleMetadata.TttInterest(tttInterest = TttInterest.unsafeEmpty)))), + None), + ( + GuideItemDetails(transparentGuideDetails = Some( + TransparentGuideDetails.TopicMetadata( + TopicModuleMetadata.SimClusterInterest(simClusterInterest = + com.twitter.guide.scribing.thriftscala.SimClusterInterest.unsafeEmpty)))), + None), + ( + GuideItemDetails(transparentGuideDetails = Some( + TransparentGuideDetails.TopicMetadata(TopicModuleMetadata.UnknownUnionField(field = + TFieldBlob(new TField(), Array.empty[Byte]))))), + None), + ( + GuideItemDetails(transparentGuideDetails = Some( + TransparentGuideDetails.TopicMetadata( + TopicModuleMetadata.SemanticCoreInterest( + com.twitter.guide.scribing.thriftscala.SemanticCoreInterest.unsafeEmpty + .copy(domainId = "131", entityId = "1"))))), + Some(1L)), + ) + + forEvery(testData) { (guideItemDetails: GuideItemDetails, topicId: Option[Long]) => + getTopicFromGuide(Item(guideItemDetails = Some(guideItemDetails))) shouldEqual topicId + } + } + + test("getTopicId should return topicIds") { + getTopicId( + item = Item(suggestionDetails = Some( + SuggestionDetails(decodedControllerData = Some( + ControllerData.V2( + ControllerDataV2.HomeTweets( + HomeTweetsControllerData.V1(HomeTweetsControllerDataV1(topicId = Some(1L)))) + ))))), + namespace = EventNamespace( + page = Some("onboarding"), + section = Some("section has topic"), + component = Some("component has topic"), + element = Some("element has topic") + ) + ) shouldEqual Some(1L) + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TweetypieEventAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TweetypieEventAdapterSpec.scala new file mode 100644 index 000000000..c23b5db54 --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/TweetypieEventAdapterSpec.scala @@ -0,0 +1,852 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.gizmoduck.thriftscala.User +import com.twitter.gizmoduck.thriftscala.UserType +import com.twitter.inject.Test +import com.twitter.snowflake.id.SnowflakeId +import com.twitter.tweetypie.thriftscala.AdditionalFieldDeleteEvent +import com.twitter.tweetypie.thriftscala.AdditionalFieldUpdateEvent +import com.twitter.tweetypie.thriftscala.AuditDeleteTweet +import com.twitter.tweetypie.thriftscala.DeviceSource +import com.twitter.tweetypie.thriftscala.EditControl +import com.twitter.tweetypie.thriftscala.EditControlEdit +import com.twitter.tweetypie.thriftscala.Language +import com.twitter.tweetypie.thriftscala.Place +import com.twitter.tweetypie.thriftscala.PlaceType +import com.twitter.tweetypie.thriftscala.QuotedTweet +import com.twitter.tweetypie.thriftscala.QuotedTweetDeleteEvent +import com.twitter.tweetypie.thriftscala.QuotedTweetTakedownEvent +import com.twitter.tweetypie.thriftscala.Reply +import com.twitter.tweetypie.thriftscala.Share +import com.twitter.tweetypie.thriftscala.Tweet +import com.twitter.tweetypie.thriftscala.TweetCoreData +import com.twitter.tweetypie.thriftscala.TweetCreateEvent +import com.twitter.tweetypie.thriftscala.TweetDeleteEvent +import com.twitter.tweetypie.thriftscala.TweetEvent +import com.twitter.tweetypie.thriftscala.TweetEventData +import com.twitter.tweetypie.thriftscala.TweetEventFlags +import com.twitter.tweetypie.thriftscala.TweetPossiblySensitiveUpdateEvent +import com.twitter.tweetypie.thriftscala.TweetScrubGeoEvent +import com.twitter.tweetypie.thriftscala.TweetTakedownEvent +import com.twitter.tweetypie.thriftscala.TweetUndeleteEvent +import com.twitter.tweetypie.thriftscala.UserScrubGeoEvent +import com.twitter.unified_user_actions.adapter.tweetypie_event.TweetypieEventAdapter +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks +import org.scalatest.prop.TableFor1 +import org.scalatest.prop.TableFor2 +import org.scalatest.prop.TableFor3 + +class TweetypieEventAdapterSpec extends Test with TableDrivenPropertyChecks { + trait Fixture { + val frozenTime: Time = Time.fromMilliseconds(1658949273000L) + + val tweetDeleteEventTime: Time = Time.fromMilliseconds(1658949253000L) + + val tweetId = 1554576940856246272L + val timestamp: Long = SnowflakeId.unixTimeMillisFromId(tweetId) + val userId = 1L + val user: User = User( + id = userId, + createdAtMsec = 1000L, + updatedAtMsec = 1000L, + userType = UserType.Normal, + ) + + val actionedTweetId = 1554576940756246333L + val actionedTweetTimestamp: Long = SnowflakeId.unixTimeMillisFromId(actionedTweetId) + val actionedTweetAuthorId = 2L + + val actionedByActionedTweetId = 1554566940756246272L + val actionedByActionedTweetTimestamp: Long = + SnowflakeId.unixTimeMillisFromId(actionedByActionedTweetId) + val actionedByActionedTweetAuthorId = 3L + + val tweetEventFlags: TweetEventFlags = TweetEventFlags(timestampMs = timestamp) + val language: Option[Language] = Some(Language("EN-US", false)) + val deviceSource: Option[DeviceSource] = Some( + DeviceSource( + id = 0, + parameter = "", + internalName = "", + name = "name", + url = "url", + display = "display", + clientAppId = Option(100L))) + val place: Option[Place] = Some( + Place( + id = "id", + `type` = PlaceType.City, + fullName = "San Francisco", + name = "SF", + countryCode = Some("US"), + )) + + // for TweetDeleteEvent + val auditDeleteTweet = Some( + AuditDeleteTweet( + clientApplicationId = Option(200L) + )) + + val tweetCoreData: TweetCoreData = + TweetCoreData(userId, text = "text", createdVia = "created_via", createdAtSecs = timestamp) + val baseTweet: Tweet = Tweet( + tweetId, + coreData = Some(tweetCoreData), + language = language, + deviceSource = deviceSource, + place = place) + + def getCreateTweetCoreData(userId: Long, timestamp: Long): TweetCoreData = + tweetCoreData.copy(userId = userId, createdAtSecs = timestamp) + def getRetweetTweetCoreData( + userId: Long, + retweetedTweetId: Long, + retweetedAuthorId: Long, + parentStatusId: Long, + timestamp: Long + ): TweetCoreData = tweetCoreData.copy( + userId = userId, + share = Some( + Share( + sourceStatusId = retweetedTweetId, + sourceUserId = retweetedAuthorId, + parentStatusId = parentStatusId + )), + createdAtSecs = timestamp + ) + def getReplyTweetCoreData( + userId: Long, + repliedTweetId: Long, + repliedAuthorId: Long, + timestamp: Long + ): TweetCoreData = tweetCoreData.copy( + userId = userId, + reply = Some( + Reply( + inReplyToStatusId = Some(repliedTweetId), + inReplyToUserId = repliedAuthorId, + ) + ), + createdAtSecs = timestamp) + def getQuoteTweetCoreData(userId: Long, timestamp: Long): TweetCoreData = + tweetCoreData.copy(userId = userId, createdAtSecs = timestamp) + + def getTweet(tweetId: Long, userId: Long, timestamp: Long): Tweet = + baseTweet.copy(id = tweetId, coreData = Some(getCreateTweetCoreData(userId, timestamp))) + + def getRetweet( + tweetId: Long, + userId: Long, + timestamp: Long, + retweetedTweetId: Long, + retweetedUserId: Long, + parentStatusId: Option[Long] = None + ): Tweet = + baseTweet.copy( + id = tweetId, + coreData = Some( + getRetweetTweetCoreData( + userId, + retweetedTweetId, + retweetedUserId, + parentStatusId.getOrElse(retweetedTweetId), + timestamp))) + + def getQuote( + tweetId: Long, + userId: Long, + timestamp: Long, + quotedTweetId: Long, + quotedUserId: Long + ): Tweet = + baseTweet.copy( + id = tweetId, + coreData = Some(getQuoteTweetCoreData(userId, timestamp)), + quotedTweet = Some(QuotedTweet(quotedTweetId, quotedUserId))) + + def getReply( + tweetId: Long, + userId: Long, + repliedTweetId: Long, + repliedAuthorId: Long, + timestamp: Long + ): Tweet = + baseTweet.copy( + id = tweetId, + coreData = Some(getReplyTweetCoreData(userId, repliedTweetId, repliedAuthorId, timestamp)), + ) + + // ignored tweet events + val additionalFieldUpdateEvent: TweetEvent = TweetEvent( + TweetEventData.AdditionalFieldUpdateEvent(AdditionalFieldUpdateEvent(baseTweet)), + tweetEventFlags) + val additionalFieldDeleteEvent: TweetEvent = TweetEvent( + TweetEventData.AdditionalFieldDeleteEvent( + AdditionalFieldDeleteEvent(Map(tweetId -> Seq.empty)) + ), + tweetEventFlags + ) + val tweetUndeleteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetUndeleteEvent(TweetUndeleteEvent(baseTweet)), + tweetEventFlags + ) + val tweetScrubGeoEvent: TweetEvent = TweetEvent( + TweetEventData.TweetScrubGeoEvent(TweetScrubGeoEvent(tweetId, userId)), + tweetEventFlags) + val tweetTakedownEvent: TweetEvent = TweetEvent( + TweetEventData.TweetTakedownEvent(TweetTakedownEvent(tweetId, userId)), + tweetEventFlags + ) + val userScrubGeoEvent: TweetEvent = TweetEvent( + TweetEventData.UserScrubGeoEvent(UserScrubGeoEvent(userId = userId, maxTweetId = tweetId)), + tweetEventFlags + ) + val tweetPossiblySensitiveUpdateEvent: TweetEvent = TweetEvent( + TweetEventData.TweetPossiblySensitiveUpdateEvent( + TweetPossiblySensitiveUpdateEvent( + tweetId = tweetId, + userId = userId, + nsfwAdmin = false, + nsfwUser = false)), + tweetEventFlags + ) + val quotedTweetDeleteEvent: TweetEvent = TweetEvent( + TweetEventData.QuotedTweetDeleteEvent( + QuotedTweetDeleteEvent( + quotingTweetId = tweetId, + quotingUserId = userId, + quotedTweetId = tweetId, + quotedUserId = userId)), + tweetEventFlags + ) + val quotedTweetTakedownEvent: TweetEvent = TweetEvent( + TweetEventData.QuotedTweetTakedownEvent( + QuotedTweetTakedownEvent( + quotingTweetId = tweetId, + quotingUserId = userId, + quotedTweetId = tweetId, + quotedUserId = userId, + takedownCountryCodes = Seq.empty, + takedownReasons = Seq.empty + ) + ), + tweetEventFlags + ) + val replyOnlyTweet = + getReply(tweetId, userId, actionedTweetId, actionedTweetAuthorId, timestamp) + val replyAndRetweetTweet = replyOnlyTweet.copy(coreData = replyOnlyTweet.coreData.map( + _.copy(share = Some( + Share( + sourceStatusId = actionedTweetId, + sourceUserId = actionedTweetAuthorId, + parentStatusId = actionedTweetId + ))))) + val replyRetweetPresentEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = replyAndRetweetTweet, + user = user, + sourceTweet = + Some(getTweet(actionedTweetId, actionedTweetAuthorId, actionedTweetTimestamp)) + )), + tweetEventFlags + ) + + def getExpectedUUA( + userId: Long, + actionTweetId: Long, + actionTweetAuthorId: Long, + sourceTimestampMs: Long, + actionType: ActionType, + replyingTweetId: Option[Long] = None, + quotingTweetId: Option[Long] = None, + retweetingTweetId: Option[Long] = None, + inReplyToTweetId: Option[Long] = None, + quotedTweetId: Option[Long] = None, + retweetedTweetId: Option[Long] = None, + editedTweetId: Option[Long] = None, + appId: Option[Long] = None, + ): UnifiedUserAction = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(userId)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = actionTweetId, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(actionTweetAuthorId))), + replyingTweetId = replyingTweetId, + quotingTweetId = quotingTweetId, + retweetingTweetId = retweetingTweetId, + inReplyToTweetId = inReplyToTweetId, + quotedTweetId = quotedTweetId, + retweetedTweetId = retweetedTweetId, + editedTweetId = editedTweetId + ) + ), + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = sourceTimestampMs, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTweetypieEvents, + language = None, + countryCode = Some("US"), + clientAppId = appId, + ) + ) + + /* Note: This is a deprecated field {ActionTweetType}. + * We keep this here to document the behaviors of each unit test. + /* + * Types of tweets on which actions can take place. + * Note that retweets are not included because actions can NOT take place + * on retweets. They can only take place on source tweets of retweets, + * which are one of the ActionTweetTypes listed below. + */ + enum ActionTweetType { + /* Is a standard (non-retweet, non-reply, non-quote) tweet */ + Default = 0 + + /* + * Is a tweet in a reply chain (this includes tweets + * without a leading @mention, as long as they are in reply + * to some tweet id) + */ + Reply = 1 + + /* Is a retweet with comment */ + Quote = 2 + }(persisted='true', hasPersonalData='false') + */ + + // tweet create + val tweetCreateEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getTweet(tweetId, userId, timestamp), + user = user, + ) + ), + tweetEventFlags) + val expectedUUACreate = getExpectedUUA( + userId = userId, + actionTweetId = tweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Default), + */ + actionTweetAuthorId = userId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetCreate, + appId = deviceSource.flatMap(_.clientAppId) + ) + + // tweet reply to a default + val tweetReplyDefaultEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getReply(tweetId, userId, actionedTweetId, actionedTweetAuthorId, timestamp), + user = user + ) + ), + tweetEventFlags + ) + val expectedUUAReplyDefault = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = None, + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetReply, + replyingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet reply to a reply + val tweetReplyToReplyEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getReply(tweetId, userId, actionedTweetId, actionedTweetAuthorId, timestamp), + user = user + ) + ), + tweetEventFlags + ) + // tweet reply to a quote + val tweetReplyToQuoteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getReply(tweetId, userId, actionedTweetId, actionedTweetAuthorId, timestamp), + user = user + ) + ), + tweetEventFlags + ) + // tweet quote a default + val tweetQuoteDefaultEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getQuote(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + quotedTweet = + Some(getTweet(actionedTweetId, actionedTweetAuthorId, actionedTweetTimestamp)) + ) + ), + tweetEventFlags + ) + val expectedUUAQuoteDefault: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Default), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetQuote, + quotingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet quote a reply + val tweetQuoteReplyEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getQuote(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + quotedTweet = Some( + getReply( + tweetId = actionedTweetId, + userId = actionedTweetAuthorId, + repliedTweetId = actionedByActionedTweetId, + repliedAuthorId = actionedByActionedTweetAuthorId, + timestamp = actionedTweetTimestamp + )) + ) + ), + tweetEventFlags + ) + val expectedUUAQuoteReply: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Reply), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetQuote, + quotingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet quote a quote + val tweetQuoteQuoteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getQuote(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + quotedTweet = Some( + getQuote( + tweetId = actionedTweetId, + userId = actionedTweetAuthorId, + timestamp = actionedTweetTimestamp, + quotedTweetId = actionedByActionedTweetId, + quotedUserId = actionedByActionedTweetAuthorId, + )) + ) + ), + tweetEventFlags + ) + val expectedUUAQuoteQuote: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Quote), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetQuote, + quotingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet retweet a default + val tweetRetweetDefaultEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getRetweet(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + sourceTweet = + Some(getTweet(actionedTweetId, actionedTweetAuthorId, actionedTweetTimestamp)) + ) + ), + tweetEventFlags + ) + val expectedUUARetweetDefault: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Default), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetRetweet, + retweetingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet retweet a reply + val tweetRetweetReplyEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getRetweet(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + sourceTweet = Some( + getReply( + actionedTweetId, + actionedTweetAuthorId, + actionedByActionedTweetId, + actionedByActionedTweetAuthorId, + actionedTweetTimestamp)) + ) + ), + tweetEventFlags + ) + val expectedUUARetweetReply: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Reply), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetRetweet, + retweetingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet retweet a quote + val tweetRetweetQuoteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getRetweet(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = user, + sourceTweet = Some( + getQuote( + actionedTweetId, + actionedTweetAuthorId, + actionedTweetTimestamp, + actionedByActionedTweetId, + actionedByActionedTweetAuthorId + )) + ) + ), + tweetEventFlags + ) + val expectedUUARetweetQuote: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Quote), + */ + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetRetweet, + retweetingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // tweet retweet a retweet + val tweetRetweetRetweetEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getRetweet( + tweetId, + userId, + timestamp, + actionedByActionedTweetId, + actionedByActionedTweetAuthorId, + Some(actionedTweetId)), + user = user, + sourceTweet = Some( + getTweet( + actionedByActionedTweetId, + actionedByActionedTweetAuthorId, + actionedByActionedTweetTimestamp, + )) + ) + ), + tweetEventFlags + ) + val expectedUUARetweetRetweet: UnifiedUserAction = getExpectedUUA( + userId = userId, + actionTweetId = actionedByActionedTweetId, + /* @see comment above for ActionTweetType + actionTweetType = Some(ActionTweetType.Default), + */ + actionTweetAuthorId = actionedByActionedTweetAuthorId, + sourceTimestampMs = timestamp, + actionType = ActionType.ServerTweetRetweet, + retweetingTweetId = Some(tweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // delete a tweet + val tweetDeleteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetDeleteEvent( + TweetDeleteEvent( + tweet = getTweet(tweetId, userId, timestamp), + user = Some(user), + audit = auditDeleteTweet + ) + ), + tweetEventFlags.copy(timestampMs = tweetDeleteEventTime.inMilliseconds) + ) + val expectedUUADeleteDefault: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = tweetId, + actionTweetAuthorId = userId, + sourceTimestampMs = tweetDeleteEventTime.inMilliseconds, + actionType = ActionType.ServerTweetDelete, + appId = auditDeleteTweet.flatMap(_.clientApplicationId) + ) + // delete a reply - Unreply + val tweetUnreplyEvent: TweetEvent = TweetEvent( + TweetEventData.TweetDeleteEvent( + TweetDeleteEvent( + tweet = getReply(tweetId, userId, actionedTweetId, actionedTweetAuthorId, timestamp), + user = Some(user), + audit = auditDeleteTweet + ) + ), + tweetEventFlags.copy(timestampMs = tweetDeleteEventTime.inMilliseconds) + ) + val expectedUUAUnreply: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = actionedTweetId, + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = tweetDeleteEventTime.inMilliseconds, + actionType = ActionType.ServerTweetUnreply, + replyingTweetId = Some(tweetId), + appId = auditDeleteTweet.flatMap(_.clientApplicationId) + ) + // delete a quote - Unquote + val tweetUnquoteEvent: TweetEvent = TweetEvent( + TweetEventData.TweetDeleteEvent( + TweetDeleteEvent( + tweet = getQuote(tweetId, userId, timestamp, actionedTweetId, actionedTweetAuthorId), + user = Some(user), + audit = auditDeleteTweet + ) + ), + tweetEventFlags.copy(timestampMs = tweetDeleteEventTime.inMilliseconds) + ) + val expectedUUAUnquote: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = actionedTweetId, + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = tweetDeleteEventTime.inMilliseconds, + actionType = ActionType.ServerTweetUnquote, + quotingTweetId = Some(tweetId), + appId = auditDeleteTweet.flatMap(_.clientApplicationId) + ) + // delete a retweet / unretweet + val tweetUnretweetEvent: TweetEvent = TweetEvent( + TweetEventData.TweetDeleteEvent( + TweetDeleteEvent( + tweet = getRetweet( + tweetId, + userId, + timestamp, + actionedTweetId, + actionedTweetAuthorId, + Some(actionedTweetId)), + user = Some(user), + audit = auditDeleteTweet + ) + ), + tweetEventFlags.copy(timestampMs = tweetDeleteEventTime.inMilliseconds) + ) + val expectedUUAUnretweet: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = actionedTweetId, + actionTweetAuthorId = actionedTweetAuthorId, + sourceTimestampMs = tweetDeleteEventTime.inMilliseconds, + actionType = ActionType.ServerTweetUnretweet, + retweetingTweetId = Some(tweetId), + appId = auditDeleteTweet.flatMap(_.clientApplicationId) + ) + // edit a tweet, the new tweet from edit is a default tweet (not reply/quote/retweet) + val regularTweetFromEditEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getTweet( + tweetId, + userId, + timestamp + ).copy(editControl = + Some(EditControl.Edit(EditControlEdit(initialTweetId = actionedTweetId)))), + user = user, + ) + ), + tweetEventFlags + ) + val expectedUUARegularTweetFromEdit: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = tweetId, + actionTweetAuthorId = userId, + sourceTimestampMs = tweetEventFlags.timestampMs, + actionType = ActionType.ServerTweetEdit, + editedTweetId = Some(actionedTweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + // edit a tweet, the new tweet from edit is a Quote + val quoteFromEditEvent: TweetEvent = TweetEvent( + TweetEventData.TweetCreateEvent( + TweetCreateEvent( + tweet = getQuote( + tweetId, + userId, + timestamp, + actionedTweetId, + actionedTweetAuthorId + ).copy(editControl = + Some(EditControl.Edit(EditControlEdit(initialTweetId = actionedByActionedTweetId)))), + user = user, + ) + ), + tweetEventFlags + ) + val expectedUUAQuoteFromEdit: UnifiedUserAction = getExpectedUUA( + userId = user.id, + actionTweetId = tweetId, + actionTweetAuthorId = userId, + sourceTimestampMs = tweetEventFlags.timestampMs, + actionType = ActionType.ServerTweetEdit, + editedTweetId = Some(actionedByActionedTweetId), + quotedTweetId = Some(actionedTweetId), + appId = deviceSource.flatMap(_.clientAppId) + ) + } + + test("ignore non-TweetCreate / non-TweetDelete events") { + new Fixture { + val ignoredTweetEvents: TableFor1[TweetEvent] = Table( + "ignoredTweetEvents", + additionalFieldUpdateEvent, + additionalFieldDeleteEvent, + tweetUndeleteEvent, + tweetScrubGeoEvent, + tweetTakedownEvent, + userScrubGeoEvent, + tweetPossiblySensitiveUpdateEvent, + quotedTweetDeleteEvent, + quotedTweetTakedownEvent + ) + forEvery(ignoredTweetEvents) { tweetEvent: TweetEvent => + val actual = TweetypieEventAdapter.adaptEvent(tweetEvent) + assert(actual.isEmpty) + } + } + } + + test("ignore invalid TweetCreate events") { + new Fixture { + val ignoredTweetEvents: TableFor2[String, TweetEvent] = Table( + ("invalidType", "event"), + ("replyAndRetweetBothPresent", replyRetweetPresentEvent) + ) + forEvery(ignoredTweetEvents) { (_, event) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(actual.isEmpty) + } + } + } + + test("TweetypieCreateEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val actual = TweetypieEventAdapter.adaptEvent(tweetCreateEvent) + assert(Seq(expectedUUACreate) == actual) + } + } + } + + test("TweetypieReplyEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val tweetReplies: TableFor3[String, TweetEvent, UnifiedUserAction] = Table( + ("actionTweetType", "event", "expected"), + ("Default", tweetReplyDefaultEvent, expectedUUAReplyDefault), + ("Reply", tweetReplyToReplyEvent, expectedUUAReplyDefault), + ("Quote", tweetReplyToQuoteEvent, expectedUUAReplyDefault), + ) + forEvery(tweetReplies) { (_: String, event: TweetEvent, expected: UnifiedUserAction) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + + test("TweetypieQuoteEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val tweetQuotes: TableFor3[String, TweetEvent, UnifiedUserAction] = Table( + ("actionTweetType", "event", "expected"), + ("Default", tweetQuoteDefaultEvent, expectedUUAQuoteDefault), + ("Reply", tweetQuoteReplyEvent, expectedUUAQuoteReply), + ("Quote", tweetQuoteQuoteEvent, expectedUUAQuoteQuote), + ) + forEvery(tweetQuotes) { (_: String, event: TweetEvent, expected: UnifiedUserAction) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + + test("TweetypieRetweetEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val tweetRetweets: TableFor3[String, TweetEvent, UnifiedUserAction] = Table( + ("actionTweetType", "event", "expected"), + ("Default", tweetRetweetDefaultEvent, expectedUUARetweetDefault), + ("Reply", tweetRetweetReplyEvent, expectedUUARetweetReply), + ("Quote", tweetRetweetQuoteEvent, expectedUUARetweetQuote), + ("Retweet", tweetRetweetRetweetEvent, expectedUUARetweetRetweet), + ) + forEvery(tweetRetweets) { (_: String, event: TweetEvent, expected: UnifiedUserAction) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + + test("TweetypieDeleteEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val tweetDeletes: TableFor3[String, TweetEvent, UnifiedUserAction] = Table( + ("actionTweetType", "event", "expected"), + ("Default", tweetDeleteEvent, expectedUUADeleteDefault), + ("Reply", tweetUnreplyEvent, expectedUUAUnreply), + ("Quote", tweetUnquoteEvent, expectedUUAUnquote), + ("Retweet", tweetUnretweetEvent, expectedUUAUnretweet), + ) + forEvery(tweetDeletes) { (_: String, event: TweetEvent, expected: UnifiedUserAction) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + + test("TweetypieEditEvent") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + val tweetEdits: TableFor3[String, TweetEvent, UnifiedUserAction] = Table( + ("actionTweetType", "event", "expected"), + ("RegularTweetFromEdit", regularTweetFromEditEvent, expectedUUARegularTweetFromEdit), + ("QuoteFromEdit", quoteFromEditEvent, expectedUUAQuoteFromEdit) + ) + forEvery(tweetEdits) { (_: String, event: TweetEvent, expected: UnifiedUserAction) => + val actual = TweetypieEventAdapter.adaptEvent(event) + assert(Seq(expected) === actual) + } + } + } + } + +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/UserModificationAdapterSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/UserModificationAdapterSpec.scala new file mode 100644 index 000000000..238e5dc7e --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/UserModificationAdapterSpec.scala @@ -0,0 +1,25 @@ +package unified_user_actions.adapter.src.test.scala.com.twitter.unified_user_actions.adapter + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.adapter.TestFixtures.UserModificationEventFixture +import com.twitter.unified_user_actions.adapter.user_modification.UserModificationAdapter +import com.twitter.util.Time +import org.scalatest.prop.TableDrivenPropertyChecks + +class UserModificationAdapterSpec extends Test with TableDrivenPropertyChecks { + test("User Create") { + new UserModificationEventFixture { + Time.withTimeAt(frozenTime) { _ => + assert(UserModificationAdapter.adaptEvent(userCreate) === Seq(expectedUuaUserCreate)) + } + } + } + + test("User Update") { + new UserModificationEventFixture { + Time.withTimeAt(frozenTime) { _ => + assert(UserModificationAdapter.adaptEvent(userUpdate) === Seq(expectedUuaUserUpdate)) + } + } + } +} diff --git a/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/VideoClientEventUtilsSpec.scala b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/VideoClientEventUtilsSpec.scala new file mode 100644 index 000000000..c22ca795a --- /dev/null +++ b/unified_user_actions/adapter/src/test/scala/com/twitter/unified_user_actions/adapter/VideoClientEventUtilsSpec.scala @@ -0,0 +1,102 @@ +package com.twitter.unified_user_actions.adapter + +import com.twitter.clientapp.thriftscala.AmplifyDetails +import com.twitter.clientapp.thriftscala.MediaDetails +import com.twitter.clientapp.thriftscala.MediaType +import com.twitter.mediaservices.commons.thriftscala.MediaCategory +import com.twitter.unified_user_actions.adapter.client_event.VideoClientEventUtils.getVideoMetadata +import com.twitter.unified_user_actions.adapter.client_event.VideoClientEventUtils.videoIdFromMediaIdentifier +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.mock.Mockito +import com.twitter.video.analytics.thriftscala._ +import org.junit.runner.RunWith +import org.scalatest.funsuite.AnyFunSuite +import org.scalatest.matchers.should.Matchers +import org.scalatest.prop.TableDrivenPropertyChecks +import org.scalatestplus.junit.JUnitRunner + +@RunWith(classOf[JUnitRunner]) +class VideoClientEventUtilsSpec + extends AnyFunSuite + with Matchers + with Mockito + with TableDrivenPropertyChecks { + + trait Fixture { + val mediaDetails = Seq[MediaDetails]( + MediaDetails( + contentId = Some("456"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)), + MediaDetails( + contentId = Some("123"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)), + MediaDetails( + contentId = Some("789"), + mediaType = Some(MediaType.ConsumerVideo), + dynamicAds = Some(false)) + ) + + val videoMetadata: TweetActionInfo = TweetActionInfo.TweetVideoWatch( + TweetVideoWatch(mediaType = Some(MediaType.ConsumerVideo), isMonetizable = Some(false))) + + val videoMetadataWithAmplifyDetailsVideoType: TweetActionInfo = TweetActionInfo.TweetVideoWatch( + TweetVideoWatch( + mediaType = Some(MediaType.ConsumerVideo), + isMonetizable = Some(false), + videoType = Some("content"))) + + val validMediaIdentifier: MediaIdentifier = MediaIdentifier.MediaPlatformIdentifier( + MediaPlatformIdentifier(mediaId = 123L, mediaCategory = MediaCategory.TweetVideo)) + + val invalidMediaIdentifier: MediaIdentifier = MediaIdentifier.AmplifyCardIdentifier( + AmplifyCardIdentifier(vmapUrl = "", contentId = "") + ) + } + + test("findVideoMetadata") { + new Fixture { + val testData = Table( + ("testType", "mediaId", "mediaItems", "amplifyDetails", "expectedOutput"), + ("emptyMediaDetails", "123", Seq[MediaDetails](), None, None), + ("mediaIdNotFound", "111", mediaDetails, None, None), + ("mediaIdFound", "123", mediaDetails, None, Some(videoMetadata)), + ( + "mediaIdFound", + "123", + mediaDetails, + Some(AmplifyDetails(videoType = Some("content"))), + Some(videoMetadataWithAmplifyDetailsVideoType)) + ) + + forEvery(testData) { + ( + _: String, + mediaId: String, + mediaItems: Seq[MediaDetails], + amplifyDetails: Option[AmplifyDetails], + expectedOutput: Option[TweetActionInfo] + ) => + val actual = getVideoMetadata(mediaId, mediaItems, amplifyDetails) + assert(expectedOutput === actual) + } + } + } + + test("videoIdFromMediaIdentifier") { + new Fixture { + val testData = Table( + ("testType", "mediaIdentifier", "expectedOutput"), + ("validMediaIdentifierType", validMediaIdentifier, Some("123")), + ("invalidMediaIdentifierType", invalidMediaIdentifier, None) + ) + + forEvery(testData) { + (_: String, mediaIdentifier: MediaIdentifier, expectedOutput: Option[String]) => + val actual = videoIdFromMediaIdentifier(mediaIdentifier) + assert(expectedOutput === actual) + } + } + } +} diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/BUILD b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/BUILD new file mode 100644 index 000000000..46e4c1c23 --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/BUILD @@ -0,0 +1,11 @@ +scala_library( + sources = [ + "*.scala", + ], + compiler_option_sets = ["fatal_warnings"], + # Our runtime is using Java 11, but for compatibility with other internal libraries that + # are still on Java 8, we'll make our target platform to be Java 8 as well until everyone can + # migrate. + platform = "java8", + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Clusters.scala b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Clusters.scala new file mode 100644 index 000000000..fd9c29aee --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Clusters.scala @@ -0,0 +1,24 @@ +package com.twitter.unified_user_actions.client.config + +sealed trait ClusterConfig { + val name: String + val environment: EnvironmentConfig +} + +object Clusters { + /* + * Our production cluster for external consumption. Our SLAs are enforced. + */ + case object ProdCluster extends ClusterConfig { + override val name: String = Constants.UuaKafkaProdClusterName + override val environment: EnvironmentConfig = Environments.Prod + } + + /* + * Our staging cluster for external development and pre-releases. No SLAs are enforced. + */ + case object StagingCluster extends ClusterConfig { + override val name: String = Constants.UuaKafkaStagingClusterName + override val environment: EnvironmentConfig = Environments.Staging + } +} diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Constants.scala b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Constants.scala new file mode 100644 index 000000000..c3f8244b2 --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Constants.scala @@ -0,0 +1,10 @@ +package com.twitter.unified_user_actions.client.config + +object Constants { + val UuaKafkaTopicName = "unified_user_actions" + val UuaEngagementOnlyKafkaTopicName = "unified_user_actions_engagements" + val UuaKafkaProdClusterName = "/s/kafka/bluebird-1" + val UuaKafkaStagingClusterName = "/s/kafka/custdevel" + val UuaProdEnv = "prod" + val UuaStagingEnv = "staging" +} diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Environments.scala b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Environments.scala new file mode 100644 index 000000000..9e24363fe --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/Environments.scala @@ -0,0 +1,15 @@ +package com.twitter.unified_user_actions.client.config + +sealed trait EnvironmentConfig { + val name: String +} + +object Environments { + case object Prod extends EnvironmentConfig { + override val name: String = Constants.UuaProdEnv + } + + case object Staging extends EnvironmentConfig { + override val name: String = Constants.UuaStagingEnv + } +} diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/KafkaConfigs.scala b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/KafkaConfigs.scala new file mode 100644 index 000000000..54b4378f2 --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config/KafkaConfigs.scala @@ -0,0 +1,61 @@ +package com.twitter.unified_user_actions.client.config + +sealed trait ClientConfig { + val cluster: ClusterConfig + val topic: String + val environment: EnvironmentConfig +} + +class AbstractClientConfig(isEngagementOnly: Boolean, env: EnvironmentConfig) extends ClientConfig { + override val cluster: ClusterConfig = { + env match { + case Environments.Prod => Clusters.ProdCluster + case Environments.Staging => Clusters.StagingCluster + case _ => Clusters.ProdCluster + } + } + + override val topic: String = { + if (isEngagementOnly) Constants.UuaEngagementOnlyKafkaTopicName + else Constants.UuaKafkaTopicName + } + + override val environment: EnvironmentConfig = env +} + +object KafkaConfigs { + + /* + * Unified User Actions Kafka config with all events (engagements and impressions). + * Use this config when you mainly need impression data and data volume is not an issue. + */ + case object ProdUnifiedUserActions + extends AbstractClientConfig(isEngagementOnly = false, env = Environments.Prod) + + /* + * Unified User Actions Kafka config with engagements events only. + * Use this config when you only need engagement data. The data volume should be a lot smaller + * than our main config. + */ + case object ProdUnifiedUserActionsEngagementOnly + extends AbstractClientConfig(isEngagementOnly = true, env = Environments.Prod) + + /* + * Staging Environment for integration and testing. This is not a production config. + * + * Unified User Actions Kafka config with all events (engagements and impressions). + * Use this config when you mainly need impression data and data volume is not an issue. + */ + case object StagingUnifiedUserActions + extends AbstractClientConfig(isEngagementOnly = false, env = Environments.Staging) + + /* + * Staging Environment for integration and testing. This is not a production config. + * + * Unified User Actions Kafka config with engagements events only. + * Use this config when you only need engagement data. The data volume should be a lot smaller + * than our main config. + */ + case object StagingUnifiedUserActionsEngagementOnly + extends AbstractClientConfig(isEngagementOnly = true, env = Environments.Staging) +} diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/BUILD b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/BUILD new file mode 100644 index 000000000..b57b14ead --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/BUILD @@ -0,0 +1,21 @@ +scala_library( + sources = [ + "UnifiedUserActionsSourceScrooge.scala", + ], + compiler_option_sets = ["fatal_warnings"], + # Our runtime is using Java 11, but for compatibility with other internal libraries that + # are still on Java 8, we'll make our target platform to be Java 8 as well until everyone can + # migrate. + platform = "java8", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/src/jvm/com/twitter/summingbird:core", + "3rdparty/src/jvm/com/twitter/summingbird:storm", + "3rdparty/src/jvm/com/twitter/tormenta:core", + "src/scala/com/twitter/summingbird_internal/sources/common", + "src/scala/com/twitter/tormenta_internal/scheme", + "src/scala/com/twitter/tormenta_internal/spout:kafka2", + "unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/UnifiedUserActionsSourceScrooge.scala b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/UnifiedUserActionsSourceScrooge.scala new file mode 100644 index 000000000..603517087 --- /dev/null +++ b/unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird/UnifiedUserActionsSourceScrooge.scala @@ -0,0 +1,43 @@ +package com.twitter.unified_user_actions.client.summingbird + +import com.twitter.summingbird.TimeExtractor +import com.twitter.summingbird.storm.Storm +import com.twitter.summingbird_internal.sources.AppId +import com.twitter.summingbird_internal.sources.SourceFactory +import com.twitter.tormenta_internal.spout.Kafka2ScroogeSpoutWrapper +import com.twitter.unified_user_actions.client.config.ClientConfig +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.client.config.KafkaConfigs + +case class UnifiedUserActionsSourceScrooge( + appId: AppId, + parallelism: Int, + kafkaConfig: ClientConfig = KafkaConfigs.ProdUnifiedUserActions, + skipToLatest: Boolean = false, + enableTls: Boolean = true) + extends SourceFactory[Storm, UnifiedUserAction] { + + override def name: String = "UnifiedUserActionsSource" + override def description: String = "Unified User Actions (UUA) events" + + // The event timestamps from summingbird's perspective (client), is our internally + // outputted timestamps (producer). This ensures time-continuity between the client and the + // producer. + val timeExtractor: TimeExtractor[UnifiedUserAction] = TimeExtractor { e => + e.eventMetadata.receivedTimestampMs + } + + override def source = { + Storm.source( + Kafka2ScroogeSpoutWrapper( + codec = UnifiedUserAction, + cluster = kafkaConfig.cluster.name, + topic = kafkaConfig.topic, + appId = appId.get, + skipToLatest = skipToLatest, + enableTls = enableTls + ), + Some(parallelism) + )(timeExtractor) + } +} diff --git a/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/BUILD.bazel b/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/BUILD.bazel new file mode 100644 index 000000000..3b7e20ff0 --- /dev/null +++ b/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/BUILD.bazel @@ -0,0 +1,12 @@ +junit_tests( + sources = ["**/*.scala"], + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/junit", + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala:test-deps", + "unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config", + ], +) diff --git a/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/KafkaConfigsSpec.scala b/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/KafkaConfigsSpec.scala new file mode 100644 index 000000000..14c741789 --- /dev/null +++ b/unified_user_actions/client/src/test/scala/com/twitter/unified_user_actions/client/config/KafkaConfigsSpec.scala @@ -0,0 +1,38 @@ +package com.twitter.unified_user_actions.client.config + +import com.twitter.inject.Test + +class KafkaConfigsSpec extends Test { + test("configs should be correct") { + val states = Seq( + ( + KafkaConfigs.ProdUnifiedUserActions, + Constants.UuaProdEnv, + Constants.UuaKafkaTopicName, + Constants.UuaKafkaProdClusterName), + ( + KafkaConfigs.ProdUnifiedUserActionsEngagementOnly, + Constants.UuaProdEnv, + Constants.UuaEngagementOnlyKafkaTopicName, + Constants.UuaKafkaProdClusterName), + ( + KafkaConfigs.StagingUnifiedUserActions, + Constants.UuaStagingEnv, + Constants.UuaKafkaTopicName, + Constants.UuaKafkaStagingClusterName), + ( + KafkaConfigs.StagingUnifiedUserActionsEngagementOnly, + Constants.UuaStagingEnv, + Constants.UuaEngagementOnlyKafkaTopicName, + Constants.UuaKafkaStagingClusterName) + ) + + states.foreach { + case (actual, expectedEnv, expectedTopic, expectedClusterName) => + assert(expectedEnv == actual.environment.name, s"in $actual") + assert(expectedTopic == actual.topic, s"in $actual") + assert(expectedClusterName == actual.cluster.name, s"in $actual") + case _ => + } + } +} diff --git a/unified_user_actions/enricher/BUILD.bazel b/unified_user_actions/enricher/BUILD.bazel new file mode 100644 index 000000000..1624a57d4 --- /dev/null +++ b/unified_user_actions/enricher/BUILD.bazel @@ -0,0 +1 @@ +# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD diff --git a/unified_user_actions/enricher/README.md b/unified_user_actions/enricher/README.md new file mode 100644 index 000000000..0b9314bdb --- /dev/null +++ b/unified_user_actions/enricher/README.md @@ -0,0 +1,24 @@ +## Aurora deploy + +## From master branch + +``` +aurora workflow build unified_user_actions/service/deploy/uua-partitioner-staging.workflow +``` + +## From your own branch + +``` +git push origin / +aurora workflow build --build-branch=/ unified_user_actions/service/deploy/uua-partitioner-staging.workflow +``` + +* Check build status: + * Dev + * https://workflows.twitter.biz/workflow/discode/uua-partitioner-staging/ + +## Monitor output topic EPS + * Prod + * unified_user_actions: https://monitoring.twitter.biz/tiny/2942881 + * Dev + * unified_user_action_sample1: https://monitoring.twitter.biz/tiny/2942879 diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/BUILD new file mode 100644 index 000000000..c9697053a --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/BUILD @@ -0,0 +1,5 @@ +scala_library( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/Exceptions.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/Exceptions.scala new file mode 100644 index 000000000..a27eaa0b9 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/Exceptions.scala @@ -0,0 +1,16 @@ +package com.twitter.unified_user_actions.enricher + +/** + * When this exception is thrown, it means that an assumption in the enricher services + * was violated and it needs to be fixed before a production deployment. + */ +abstract class FatalException(msg: String) extends Exception(msg) + +class ImplementationException(msg: String) extends FatalException(msg) + +object Exceptions { + def require(requirement: Boolean, message: String): Unit = { + if (!requirement) + throw new ImplementationException("requirement failed: " + message) + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/BUILD new file mode 100644 index 000000000..1336f18ff --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/BUILD @@ -0,0 +1,11 @@ +scala_library( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:base", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner:base", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentDriver.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentDriver.scala new file mode 100644 index 000000000..54999d810 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentDriver.scala @@ -0,0 +1,99 @@ +package com.twitter.unified_user_actions.enricher.driver + +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageType.Hydration +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageType.Repartition +import com.twitter.util.Future +import EnrichmentPlanUtils._ +import com.twitter.unified_user_actions.enricher.Exceptions +import com.twitter.unified_user_actions.enricher.ImplementationException +import com.twitter.unified_user_actions.enricher.hydrator.Hydrator +import com.twitter.unified_user_actions.enricher.partitioner.Partitioner + +/** + * A driver that will execute on a key, value tuple and produce an output to a Kafka topic. + * + * The output Kafka topic will depend on the current enrichment plan. In one scenario, the driver + * will output to a partitioned Kafka topic if the output needs to be repartitioned (after it has + * been hydrated 0 or more times as necessary). In another scenario, the driver will output to + * the final topic if there's no more work to be done. + * + * @param finalOutputTopic The final output Kafka topic + * @param partitionedTopic The intermediate Kafka topic used for repartitioning based on [[EnrichmentKey]] + * @param hydrator A hydrator that knows how to populate the metadata based on the current plan / instruction. + * @param partitioner A partitioner that knows how to transform the current uua event into an [[EnrichmentKey]]. + */ +class EnrichmentDriver( + finalOutputTopic: Option[String], + partitionedTopic: String, + hydrator: Hydrator, + partitioner: Partitioner) { + + /** + * A driver that does the following when being executed. + * It checks if we are done with enrichment plan, if not: + * - is the current stage repartitioning? + * -> remap the output key, update plan accordingly then return with the new partition key + * - is the current stage hydration? + * -> use the hydrator to hydrate the envelop, update the plan accordingly, then proceed + * recursively unless the next stage is repartitioning or this is the last stage. + */ + def execute( + key: Option[EnrichmentKey], + envelop: Future[EnrichmentEnvelop] + ): Future[(Option[EnrichmentKey], EnrichmentEnvelop)] = { + envelop.flatMap { envelop => + val plan = envelop.plan + if (plan.isEnrichmentComplete) { + val topic = finalOutputTopic.getOrElse( + throw new ImplementationException( + "A final output Kafka topic is supposed to be used but " + + "no final output topic was provided.")) + Future.value((key, envelop.copy(plan = plan.markLastStageCompletedWithOutputTopic(topic)))) + } else { + val currentStage = plan.getCurrentStage + + currentStage.stageType match { + case Repartition => + Exceptions.require( + currentStage.instructions.size == 1, + s"re-partitioning needs exactly 1 instruction but ${currentStage.instructions.size} was provided") + + val instruction = currentStage.instructions.head + val outputKey = partitioner.repartition(instruction, envelop) + val outputValue = envelop.copy( + plan = plan.markStageCompletedWithOutputTopic( + stage = currentStage, + outputTopic = partitionedTopic) + ) + Future.value((outputKey, outputValue)) + case Hydration => + Exceptions.require( + currentStage.instructions.nonEmpty, + "hydration needs at least one instruction") + + // Hydration is either initialized or completed after this, failure state + // will have to be handled upstream. Any unhandled exception will abort the entire + // stage. + // This is so that if the error in unrecoverable, the hydrator can choose to return an + // un-hydrated envelop to tolerate the error. + val finalEnvelop = currentStage.instructions.foldLeft(Future.value(envelop)) { + (curEnvelop, instruction) => + curEnvelop.flatMap(e => hydrator.hydrate(instruction, key, e)) + } + + val outputValue = finalEnvelop.map(e => + e.copy( + plan = plan.markStageCompleted(stage = currentStage) + )) + + // continue executing other stages if it can (locally) until a terminal state + execute(key, outputValue) + case _ => + throw new ImplementationException(s"Invalid / unsupported stage type $currentStage") + } + } + } + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentPlanUtils.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentPlanUtils.scala new file mode 100644 index 000000000..20f1093bc --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver/EnrichmentPlanUtils.scala @@ -0,0 +1,71 @@ +package com.twitter.unified_user_actions.enricher.driver + +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentPlan +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStage +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageStatus.Completion +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageStatus.Initialized + +object EnrichmentPlanUtils { + implicit class EnrichmentPlanStatus(plan: EnrichmentPlan) { + + /** + * Check each stage of the plan to know if we are done + */ + def isEnrichmentComplete: Boolean = { + plan.stages.forall(stage => stage.status == Completion) + } + + /** + * Get the next stage in the enrichment process. Note, if there is none this will throw + * an exception. + */ + def getCurrentStage: EnrichmentStage = { + val next = plan.stages.find(stage => stage.status == Initialized) + next match { + case Some(stage) => stage + case None => throw new IllegalStateException("check for plan completion first") + } + } + def getLastCompletedStage: EnrichmentStage = { + val completed = plan.stages.reverse.find(stage => stage.status == Completion) + completed match { + case Some(stage) => stage + case None => throw new IllegalStateException("check for plan completion first") + } + } + + /** + * Copy the current plan with the requested stage marked as complete + */ + def markStageCompletedWithOutputTopic( + stage: EnrichmentStage, + outputTopic: String + ): EnrichmentPlan = { + plan.copy( + stages = plan.stages.map(s => + if (s == stage) s.copy(status = Completion, outputTopic = Some(outputTopic)) else s) + ) + } + + def markStageCompleted( + stage: EnrichmentStage + ): EnrichmentPlan = { + plan.copy( + stages = plan.stages.map(s => if (s == stage) s.copy(status = Completion) else s) + ) + } + + /** + * Copy the current plan with the last stage marked as necessary + */ + def markLastStageCompletedWithOutputTopic( + outputTopic: String + ): EnrichmentPlan = { + val last = plan.stages.last + plan.copy( + stages = plan.stages.map(s => + if (s == last) s.copy(status = Completion, outputTopic = Some(outputTopic)) else s) + ) + } + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD new file mode 100644 index 000000000..09d16ff7a --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD @@ -0,0 +1,11 @@ +scala_library( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/guava", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap:dynmap-core", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap/json:dynmap-json", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "util/util-core:scala", + ], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlRspParser.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlRspParser.scala new file mode 100644 index 000000000..965c1ddbb --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlRspParser.scala @@ -0,0 +1,66 @@ +package com.twitter.unified_user_actions.enricher.graphql + +import com.google.common.util.concurrent.RateLimiter +import com.twitter.dynmap.DynMap +import com.twitter.dynmap.json.DynMapJson +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.util.logging.Logging +import com.twitter.util.Return +import com.twitter.util.Throw +import com.twitter.util.Try + +/** + * @param dm The DynMap parsed from the returned Json string + */ +case class GraphqlRspErrors(dm: DynMap) extends Exception { + override def toString: String = dm.toString() +} + +object GraphqlRspParser extends Logging { + private val rateLimiter = RateLimiter.create(1.0) // at most 1 log message per second + private def rateLimitedLogError(e: Throwable): Unit = + if (rateLimiter.tryAcquire()) { + error(e.getMessage, e) + } + + /** + * GraphQL's response is a Json string. + * This function first parses the raw response as a Json string, then it checks if the returned + * object has the "data" field which means the response is expected. The response could also + * return a valid Json string but with errors inside it as a list of "errors". + */ + def toDynMap( + rsp: String, + invalidRspCounter: Counter = NullStatsReceiver.NullCounter, + failedReqCounter: Counter = NullStatsReceiver.NullCounter + ): Try[DynMap] = { + val rawRsp: Try[DynMap] = DynMapJson.fromJsonString(rsp) + rawRsp match { + case Return(r) => + if (r.getMapOpt("data").isDefined) Return(r) + else { + invalidRspCounter.incr() + rateLimitedLogError(GraphqlRspErrors(r)) + Throw(GraphqlRspErrors(r)) + } + case Throw(e) => + rateLimitedLogError(e) + failedReqCounter.incr() + Throw(e) + } + } + + /** + * Similar to `toDynMap` above, but returns an Option + */ + def toDynMapOpt( + rsp: String, + invalidRspCounter: Counter = NullStatsReceiver.NullCounter, + failedReqCounter: Counter = NullStatsReceiver.NullCounter + ): Option[DynMap] = + toDynMap( + rsp = rsp, + invalidRspCounter = invalidRspCounter, + failedReqCounter = failedReqCounter).toOption +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD new file mode 100644 index 000000000..b1579c1e1 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD @@ -0,0 +1,11 @@ +scala_library( + name = "hcache", + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/guava", + "util/util-cache-guava/src/main/scala", + "util/util-cache/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCache.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCache.scala new file mode 100644 index 000000000..6cae67422 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCache.scala @@ -0,0 +1,34 @@ +package com.twitter.unified_user_actions.enricher.hcache + +import com.google.common.cache.Cache +import com.twitter.cache.FutureCache +import com.twitter.cache.guava.GuavaCache +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.util.Future + +/** + * A local cache implementation using GuavaCache. + * Underneath it uses a customized version of the EvictingCache to 1) deal with Futures, 2) add more stats. + */ +class LocalCache[K, V]( + underlying: Cache[K, Future[V]], + statsReceiver: StatsReceiver = NullStatsReceiver) { + + private[this] val cache = new GuavaCache(underlying) + private[this] val evictingCache: FutureCache[K, V] = + ObservedEvictingCache(underlying = cache, statsReceiver = statsReceiver) + + def getOrElseUpdate(key: K)(fn: => Future[V]): Future[V] = evictingCache.getOrElseUpdate(key)(fn) + + def get(key: K): Option[Future[V]] = evictingCache.get(key) + + def evict(key: K, value: Future[V]): Boolean = evictingCache.evict(key, value) + + def set(key: K, value: Future[V]): Unit = evictingCache.set(key, value) + + def reset(): Unit = + underlying.invalidateAll() + + def size: Int = evictingCache.size +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/ObservedEvictingCache.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/ObservedEvictingCache.scala new file mode 100644 index 000000000..8c7e60029 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache/ObservedEvictingCache.scala @@ -0,0 +1,91 @@ +package com.twitter.unified_user_actions.enricher.hcache + +import com.twitter.cache.FutureCache +import com.twitter.cache.FutureCacheProxy +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.util.Future +import scala.annotation.nowarn + +/** + * Adds stats and reuse the main logic of the EvictingCache. + */ +class ObservedEvictingCache[K, V](underlying: FutureCache[K, V], scopedStatsReceiver: StatsReceiver) + extends FutureCacheProxy[K, V](underlying) { + import ObservedEvictingCache._ + + private[this] val getsCounter = scopedStatsReceiver.counter(StatsNames.Gets) + private[this] val setsCounter = scopedStatsReceiver.counter(StatsNames.Sets) + private[this] val hitsCounter = scopedStatsReceiver.counter(StatsNames.Hits) + private[this] val missesCounter = scopedStatsReceiver.counter(StatsNames.Misses) + private[this] val evictionsCounter = scopedStatsReceiver.counter(StatsNames.Evictions) + private[this] val failedFuturesCounter = scopedStatsReceiver.counter(StatsNames.FailedFutures) + + @nowarn("cat=unused") + private[this] val cacheSizeGauge = scopedStatsReceiver.addGauge(StatsNames.Size)(underlying.size) + + private[this] def evictOnFailure(k: K, f: Future[V]): Future[V] = { + f.onFailure { _ => + failedFuturesCounter.incr() + evict(k, f) + } + f // we return the original future to make evict(k, f) easier to work with. + } + + override def set(k: K, v: Future[V]): Unit = { + setsCounter.incr() + super.set(k, v) + evictOnFailure(k, v) + } + + override def getOrElseUpdate(k: K)(v: => Future[V]): Future[V] = { + getsCounter.incr() + + var computeWasEvaluated = false + def computeWithTracking: Future[V] = v.onSuccess { _ => + computeWasEvaluated = true + missesCounter.incr() + } + + evictOnFailure( + k, + super.getOrElseUpdate(k)(computeWithTracking).onSuccess { _ => + if (!computeWasEvaluated) hitsCounter.incr() + } + ).interruptible() + } + + override def get(key: K): Option[Future[V]] = { + getsCounter.incr() + val value = super.get(key) + value match { + case Some(_) => hitsCounter.incr() + case _ => missesCounter.incr() + } + value + } + + override def evict(key: K, value: Future[V]): Boolean = { + val evicted = super.evict(key, value) + if (evicted) evictionsCounter.incr() + evicted + } +} + +object ObservedEvictingCache { + object StatsNames { + val Gets = "gets" + val Hits = "hits" + val Misses = "misses" + val Sets = "sets" + val Evictions = "evictions" + val FailedFutures = "failed_futures" + val Size = "size" + } + + /** + * Wraps an underlying FutureCache, ensuring that failed Futures that are set in + * the cache are evicted later. + */ + def apply[K, V](underlying: FutureCache[K, V], statsReceiver: StatsReceiver): FutureCache[K, V] = + new ObservedEvictingCache[K, V](underlying, statsReceiver) +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/AbstractHydrator.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/AbstractHydrator.scala new file mode 100644 index 000000000..e8444e965 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/AbstractHydrator.scala @@ -0,0 +1,58 @@ +package com.twitter.unified_user_actions.enricher.hydrator +import com.google.common.util.concurrent.RateLimiter +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.unified_user_actions.enricher.FatalException +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.util.Future +import com.twitter.util.logging.Logging + +abstract class AbstractHydrator(scopedStatsReceiver: StatsReceiver) extends Hydrator with Logging { + + object StatsNames { + val Exceptions = "exceptions" + val EmptyKeys = "empty_keys" + val Hydrations = "hydrations" + } + + private val exceptionsCounter = scopedStatsReceiver.counter(StatsNames.Exceptions) + private val emptyKeysCounter = scopedStatsReceiver.counter(StatsNames.EmptyKeys) + private val hydrationsCounter = scopedStatsReceiver.counter(StatsNames.Hydrations) + + // at most 1 log message per second + private val rateLimiter = RateLimiter.create(1.0) + + private def rateLimitedLogError(e: Throwable): Unit = + if (rateLimiter.tryAcquire()) { + error(e.getMessage, e) + } + + protected def safelyHydrate( + instruction: EnrichmentInstruction, + keyOpt: EnrichmentKey, + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] + + override def hydrate( + instruction: EnrichmentInstruction, + keyOpt: Option[EnrichmentKey], + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] = { + keyOpt + .map(key => { + safelyHydrate(instruction, key, envelop) + .onSuccess(_ => hydrationsCounter.incr()) + .rescue { + case e: FatalException => Future.exception(e) + case e => + rateLimitedLogError(e) + exceptionsCounter.incr() + Future.value(envelop) + } + }).getOrElse({ + emptyKeysCounter.incr() + Future.value(envelop) + }) + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD new file mode 100644 index 000000000..3f4bb6780 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD @@ -0,0 +1,36 @@ +scala_library( + name = "default", + sources = [ + "AbstractHydrator.scala", + "DefaultHydrator.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":base", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap:dynmap-core", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + ], +) + +scala_library( + name = "noop", + sources = ["NoopHydrator.scala"], + tags = ["bazel-compatible"], + dependencies = [ + ":base", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + ], +) + +scala_library( + name = "base", + sources = ["Hydrator.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + ], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydrator.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydrator.scala new file mode 100644 index 000000000..ac5802070 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydrator.scala @@ -0,0 +1,90 @@ +package com.twitter.unified_user_actions.enricher.hydrator +import com.twitter.dynmap.DynMap +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.graphql.thriftscala.AuthHeaders +import com.twitter.graphql.thriftscala.Authentication +import com.twitter.graphql.thriftscala.Document +import com.twitter.graphql.thriftscala.GraphQlRequest +import com.twitter.graphql.thriftscala.GraphqlExecutionService +import com.twitter.graphql.thriftscala.Variables +import com.twitter.unified_user_actions.enricher.ImplementationException +import com.twitter.unified_user_actions.enricher.graphql.GraphqlRspParser +import com.twitter.unified_user_actions.enricher.hcache.LocalCache +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.util.Future + +class DefaultHydrator( + cache: LocalCache[EnrichmentKey, DynMap], + graphqlClient: GraphqlExecutionService.FinagledClient, + scopedStatsReceiver: StatsReceiver = NullStatsReceiver) + extends AbstractHydrator(scopedStatsReceiver) { + + private def constructGraphqlReq( + enrichmentKey: EnrichmentKey + ): GraphQlRequest = + enrichmentKey.keyType match { + case EnrichmentIdType.TweetId => + GraphQlRequest( + // see go/graphiql/M5sHxua-RDiRtTn48CAhng + document = Document.DocumentId("M5sHxua-RDiRtTn48CAhng"), + operationName = Some("TweetHydration"), + variables = Some( + Variables.JsonEncodedVariables(s"""{"rest_id": "${enrichmentKey.id}"}""") + ), + authentication = Authentication.AuthHeaders( + AuthHeaders() + ) + ) + case _ => + throw new ImplementationException( + s"Missing implementation for hydration of type ${enrichmentKey.keyType}") + } + + private def hydrateAuthorInfo(item: Item.TweetInfo, authorId: Option[Long]): Item.TweetInfo = { + item.tweetInfo.actionTweetAuthorInfo match { + case Some(_) => item + case _ => + item.copy(tweetInfo = item.tweetInfo.copy( + actionTweetAuthorInfo = Some(AuthorInfo(authorId = authorId)) + )) + } + } + + override protected def safelyHydrate( + instruction: EnrichmentInstruction, + key: EnrichmentKey, + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] = { + instruction match { + case EnrichmentInstruction.TweetEnrichment => + val dynMapFuture = cache.getOrElseUpdate(key) { + graphqlClient + .graphql(constructGraphqlReq(enrichmentKey = key)) + .map { body => + body.response.flatMap { r => + GraphqlRspParser.toDynMapOpt(r) + }.get + } + } + + dynMapFuture.map(map => { + val authorIdOpt = + map.getLongOpt("data.tweet_result_by_rest_id.result.core.user.legacy.id_str") + + val hydratedEnvelop = envelop.uua.item match { + case item: Item.TweetInfo => + envelop.copy(uua = envelop.uua.copy(item = hydrateAuthorInfo(item, authorIdOpt))) + case _ => envelop + } + hydratedEnvelop + }) + case _ => Future.value(envelop) + } + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/Hydrator.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/Hydrator.scala new file mode 100644 index 000000000..c03bc7df6 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/Hydrator.scala @@ -0,0 +1,14 @@ +package com.twitter.unified_user_actions.enricher.hydrator + +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.util.Future + +trait Hydrator { + def hydrate( + instruction: EnrichmentInstruction, + key: Option[EnrichmentKey], + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydrator.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydrator.scala new file mode 100644 index 000000000..652b40111 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydrator.scala @@ -0,0 +1,27 @@ +package com.twitter.unified_user_actions.enricher.hydrator +import com.twitter.unified_user_actions.enricher.ImplementationException +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.util.Future + +/** + * This hydrator does nothing. If it's used by mistake for any reason, an exception will be thrown. + * Use this when you expect to have no hydration (for example, the planner shouldn't hydrate anything + * and only would perform the partitioning function). + */ +object NoopHydrator { + val OutputTopic: Option[String] = None +} + +class NoopHydrator extends Hydrator { + override def hydrate( + instruction: EnrichmentInstruction, + key: Option[EnrichmentKey], + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] = { + throw new ImplementationException( + "NoopHydrator shouldn't be invoked when configure. Check your " + + "enrichment plan.") + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD new file mode 100644 index 000000000..7a6098f17 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD @@ -0,0 +1,18 @@ +scala_library( + name = "default", + sources = ["DefaultPartitioner.scala"], + tags = ["bazel-compatible"], + dependencies = [ + ":base", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + ], +) + +scala_library( + name = "base", + sources = ["Partitioner.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + ], +) diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitioner.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitioner.scala new file mode 100644 index 000000000..06e88cc08 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitioner.scala @@ -0,0 +1,37 @@ +package com.twitter.unified_user_actions.enricher.partitioner +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.NotificationTweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.TweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.partitioner.DefaultPartitioner.NullKey +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.NotificationContent + +object DefaultPartitioner { + val NullKey: Option[EnrichmentKey] = None +} + +class DefaultPartitioner extends Partitioner { + override def repartition( + instruction: EnrichmentInstruction, + envelop: EnrichmentEnvelop + ): Option[EnrichmentKey] = { + (instruction, envelop.uua.item) match { + case (TweetEnrichment, Item.TweetInfo(info)) => + Some(EnrichmentKey(EnrichmentIdType.TweetId, info.actionTweetId)) + case (NotificationTweetEnrichment, Item.NotificationInfo(info)) => + info.content match { + case NotificationContent.TweetNotification(content) => + Some(EnrichmentKey(EnrichmentIdType.TweetId, content.tweetId)) + case NotificationContent.MultiTweetNotification(content) => + // we scarify on cache performance in this case since only a small % of + // notification content will be multi-tweet types. + Some(EnrichmentKey(EnrichmentIdType.TweetId, content.tweetIds.head)) + case _ => NullKey + } + case _ => NullKey + } + } +} diff --git a/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/Partitioner.scala b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/Partitioner.scala new file mode 100644 index 000000000..0281c1a30 --- /dev/null +++ b/unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner/Partitioner.scala @@ -0,0 +1,12 @@ +package com.twitter.unified_user_actions.enricher.partitioner + +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey + +trait Partitioner { + def repartition( + instruction: EnrichmentInstruction, + envelop: EnrichmentEnvelop + ): Option[EnrichmentKey] +} diff --git a/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/BUILD b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/BUILD new file mode 100644 index 000000000..b2eb4873f --- /dev/null +++ b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/BUILD @@ -0,0 +1,16 @@ +create_thrift_libraries( + org = "com.twitter.unified_user_actions.enricher", + base_name = "internal", + sources = ["*.thrift"], + tags = ["bazel-compatible"], + dependency_roots = [ + "src/thrift/com/twitter/clientapp/gen:clientapp", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions", + ], + generate_languages = [ + "java", + "scala", + ], + provides_java_name = "enricher_internal-thrift-java", + provides_scala_name = "enricher_internal-thrift-scala", +) diff --git a/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_envelop.thrift b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_envelop.thrift new file mode 100644 index 000000000..5f01dc039 --- /dev/null +++ b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_envelop.thrift @@ -0,0 +1,26 @@ +namespace java com.twitter.unified_user_actions.enricher.internal.thriftjava +#@namespace scala com.twitter.unified_user_actions.enricher.internal.thriftscala +#@namespace strato com.twitter.unified_user_actions.enricher.internal + +include "com/twitter/unified_user_actions/unified_user_actions.thrift" +include "enrichment_plan.thrift" + +struct EnrichmentEnvelop { + /** + * An internal ID that uniquely identifies this event created during the early stages of enrichment. + * It is useful for detecting debugging, tracing & profiling the events throughout the process. + **/ + 1: required i64 envelopId + + /** + * The UUA event to be enriched / currently being enriched / has been enriched depending on the + * stages of the enrichment process. + **/ + 2: unified_user_actions.UnifiedUserAction uua + + /** + * The current enrichment plan. It keeps track of what is currently being enriched, what still + * needs to be done so that we can bring the enrichment process to completion. + **/ + 3: enrichment_plan.EnrichmentPlan plan +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_key.thrift b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_key.thrift new file mode 100644 index 000000000..abb9ea33d --- /dev/null +++ b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_key.thrift @@ -0,0 +1,41 @@ +namespace java com.twitter.unified_user_actions.enricher.internal.thriftjava +#@namespace scala com.twitter.unified_user_actions.enricher.internal.thriftscala +#@namespace strato com.twitter.unified_user_actions.enricher.internal + +/* + * Internal key used for controling UUA enrichment & caching process. It contains very minimal + * information to allow for efficient serde, fast data look-up and to drive the partioning logics. + * + * NOTE: Don't depend on it in your application. + * NOTE: This is used internally by UUA and may change at anytime. There's no guarantee for + * backward / forward-compatibility. + * NOTE: Don't add any other metadata unless it is needed for partitioning logic. Extra enrichment + * metdata can go into the envelop. + */ +struct EnrichmentKey { + /* + * The internal type of the primary ID used for partitioning UUA data. + * + * Each type should directly correspond to an entity-level ID in UUA. + * For example, TweetInfo.actionTweetId & TweetNotification.tweetId are all tweet-entity level + * and should correspond to the same primary ID type. + **/ + 1: required EnrichmentIdType keyType + + /** + * The primary ID. This is usually a long, for other incompatible data type such as string or + * a bytes array, they can be converted into a long using their native hashCode() function. + **/ + 2: required i64 id +}(persisted='true', hasPersonalData='true') + +/** +* The type of the primary ID. For example, tweetId on a tweet & tweetId on a notification are +* all TweetId type. Similarly, UserID of a viewer and AuthorID of a tweet are all UserID type. +* +* The type here ensures that we will partition UUA data correctly across different entity-type +* (user, tweets, notification, etc.) +**/ +enum EnrichmentIdType { + TweetId = 0 +} diff --git a/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_plan.thrift b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_plan.thrift new file mode 100644 index 000000000..e64170752 --- /dev/null +++ b/unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal/enrichment_plan.thrift @@ -0,0 +1,52 @@ +namespace java com.twitter.unified_user_actions.enricher.internal.thriftjava +#@namespace scala com.twitter.unified_user_actions.enricher.internal.thriftscala +#@namespace strato com.twitter.unified_user_actions.enricher.internal + +/** +* An enrichment plan. It has multiple stages for different purposes during the enrichment process. +**/ +struct EnrichmentPlan { + 1: required list stages +}(persisted='true', hasPersonalData='false') + +/** +* A stage in the enrichment process with respect to the current key. Currently it can be of 2 options: +* - re-partitioning on an id of type X +* - hydrating metadata on an id of type X +* +* A stage also moves through different statues from initialized, processing until completion. +* Each stage contains one or more instructions. +**/ +struct EnrichmentStage { + 1: required EnrichmentStageStatus status + 2: required EnrichmentStageType stageType + 3: required list instructions + + // The output topic for this stage. This information is not available when the stage was + // first setup, and it's only available after the driver has finished working on + // this stage. + 4: optional string outputTopic +}(persisted='true', hasPersonalData='false') + +/** +* The current processing status of a stage. It should either be done (completion) or not done (initialized). +* Transient statuses such as "processing" is dangerous since we can't exactly be sure that has been done. +**/ +enum EnrichmentStageStatus { + Initialized = 0 + Completion = 20 +} + +/** +* The type of processing in this stage. For example, repartioning the data or hydrating the data. +**/ +enum EnrichmentStageType { + Repartition = 0 + Hydration = 10 +} + +enum EnrichmentInstruction { + // all enrichment based on a tweet id in UUA goes here + TweetEnrichment = 0 + NotificationTweetEnrichment = 10 +} diff --git a/unified_user_actions/enricher/src/test/resources/BUILD.bazel b/unified_user_actions/enricher/src/test/resources/BUILD.bazel new file mode 100644 index 000000000..ae9669f4f --- /dev/null +++ b/unified_user_actions/enricher/src/test/resources/BUILD.bazel @@ -0,0 +1,4 @@ +resources( + sources = ["*.*"], + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/enricher/src/test/resources/logback.xml b/unified_user_actions/enricher/src/test/resources/logback.xml new file mode 100644 index 000000000..27f50b1dc --- /dev/null +++ b/unified_user_actions/enricher/src/test/resources/logback.xml @@ -0,0 +1,45 @@ + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/BUILD.bazel new file mode 100644 index 000000000..9f6cd6248 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/BUILD.bazel @@ -0,0 +1,12 @@ +scala_library( + name = "fixture", + sources = ["EnricherFixture.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala/com/twitter/inject", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/EnricherFixture.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/EnricherFixture.scala new file mode 100644 index 000000000..7e7827ab6 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/EnricherFixture.scala @@ -0,0 +1,100 @@ +package com.twitter.unified_user_actions.enricher + +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentPlan +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStage +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageStatus +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageType +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.unified_user_actions.thriftscala.EventMetadata +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.MultiTweetNotification +import com.twitter.unified_user_actions.thriftscala.NotificationContent +import com.twitter.unified_user_actions.thriftscala.NotificationInfo +import com.twitter.unified_user_actions.thriftscala.ProfileInfo +import com.twitter.unified_user_actions.thriftscala.SourceLineage +import com.twitter.unified_user_actions.thriftscala.TweetInfo +import com.twitter.unified_user_actions.thriftscala.TweetNotification +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.unified_user_actions.thriftscala.UnknownNotification +import com.twitter.unified_user_actions.thriftscala.UserIdentifier + +trait EnricherFixture { + val partitionedTopic = "unified_user_actions_keyed_dev" + val tweetInfoEnrichmentPlan = EnrichmentPlan( + Seq( + // first stage: to repartition on tweet id -> done + EnrichmentStage( + EnrichmentStageStatus.Completion, + EnrichmentStageType.Repartition, + Seq(EnrichmentInstruction.TweetEnrichment), + Some(partitionedTopic) + ), + // next stage: to hydrate more metadata based on tweet id -> initialized + EnrichmentStage( + EnrichmentStageStatus.Initialized, + EnrichmentStageType.Hydration, + Seq(EnrichmentInstruction.TweetEnrichment) + ) + )) + + val tweetNotificationEnrichmentPlan = EnrichmentPlan( + Seq( + // first stage: to repartition on tweet id -> done + EnrichmentStage( + EnrichmentStageStatus.Completion, + EnrichmentStageType.Repartition, + Seq(EnrichmentInstruction.NotificationTweetEnrichment), + Some(partitionedTopic) + ), + // next stage: to hydrate more metadata based on tweet id -> initialized + EnrichmentStage( + EnrichmentStageStatus.Initialized, + EnrichmentStageType.Hydration, + Seq(EnrichmentInstruction.NotificationTweetEnrichment), + ) + )) + + def mkUUATweetEvent(tweetId: Long, author: Option[AuthorInfo] = None): UnifiedUserAction = { + UnifiedUserAction( + UserIdentifier(userId = Some(1L)), + item = Item.TweetInfo(TweetInfo(actionTweetId = tweetId, actionTweetAuthorInfo = author)), + actionType = ActionType.ClientTweetReport, + eventMetadata = EventMetadata(1234L, 2345L, SourceLineage.ServerTweetypieEvents) + ) + } + + def mkUUATweetNotificationEvent(tweetId: Long): UnifiedUserAction = { + mkUUATweetEvent(-1L).copy( + item = Item.NotificationInfo( + NotificationInfo( + actionNotificationId = "123456", + content = NotificationContent.TweetNotification(TweetNotification(tweetId = tweetId)))) + ) + } + + def mkUUAMultiTweetNotificationEvent(tweetIds: Long*): UnifiedUserAction = { + mkUUATweetEvent(-1L).copy( + item = Item.NotificationInfo( + NotificationInfo( + actionNotificationId = "123456", + content = NotificationContent.MultiTweetNotification( + MultiTweetNotification(tweetIds = tweetIds)))) + ) + } + + def mkUUATweetNotificationUnknownEvent(): UnifiedUserAction = { + mkUUATweetEvent(-1L).copy( + item = Item.NotificationInfo( + NotificationInfo( + actionNotificationId = "123456", + content = NotificationContent.UnknownNotification(UnknownNotification()))) + ) + } + + def mkUUAProfileEvent(userId: Long): UnifiedUserAction = { + val event = mkUUATweetEvent(1L) + event.copy(item = Item.ProfileInfo(ProfileInfo(userId))) + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/BUILD.bazel new file mode 100644 index 000000000..a6109e868 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/BUILD.bazel @@ -0,0 +1,14 @@ +junit_tests( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala/com/twitter/inject", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:base", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner:base", + "unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher:fixture", + "util/util-core:scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/DriverTest.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/DriverTest.scala new file mode 100644 index 000000000..434760d5c --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/driver/DriverTest.scala @@ -0,0 +1,284 @@ +package com.twitter.unified_user_actions.enricher.driver + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.enricher.EnricherFixture +import com.twitter.unified_user_actions.enricher.hydrator.Hydrator +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentPlan +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStage +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageStatus +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageType +import com.twitter.unified_user_actions.enricher.partitioner.Partitioner +import com.twitter.util.Await +import com.twitter.util.Future +import org.scalatest.BeforeAndAfter +import org.scalatest.matchers.should.Matchers +import scala.collection.mutable + +class DriverTest extends Test with Matchers with BeforeAndAfter { + object ExecutionContext { + var executionCount = 0 + } + + before { + ExecutionContext.executionCount = 0 + } + + trait Fixtures extends EnricherFixture { + val repartitionTweet = mkStage() + val repartitionNotiTweet = + mkStage(instructions = Seq(EnrichmentInstruction.NotificationTweetEnrichment)) + val hydrateTweet = mkStage(stageType = EnrichmentStageType.Hydration) + val hydrateTweetMultiInstructions = mkStage( + stageType = EnrichmentStageType.Hydration, + instructions = Seq( + EnrichmentInstruction.NotificationTweetEnrichment, + EnrichmentInstruction.TweetEnrichment, + EnrichmentInstruction.NotificationTweetEnrichment, + EnrichmentInstruction.TweetEnrichment + ) + ) + val hydrateNotiTweet = mkStage( + stageType = EnrichmentStageType.Hydration, + instructions = Seq(EnrichmentInstruction.NotificationTweetEnrichment)) + val key1 = EnrichmentKey(EnrichmentIdType.TweetId, 123L) + val tweet1 = mkUUATweetEvent(981L) + val hydrator = new MockHydrator + val partitioner = new MockPartitioner + val outputTopic = "output" + val partitionTopic = "partition" + + def complete( + enrichmentStage: EnrichmentStage, + outputTopic: Option[String] = None + ): EnrichmentStage = { + enrichmentStage.copy(status = EnrichmentStageStatus.Completion, outputTopic = outputTopic) + } + + def mkPlan(enrichmentStages: EnrichmentStage*): EnrichmentPlan = { + EnrichmentPlan(enrichmentStages) + } + + def mkStage( + status: EnrichmentStageStatus = EnrichmentStageStatus.Initialized, + stageType: EnrichmentStageType = EnrichmentStageType.Repartition, + instructions: Seq[EnrichmentInstruction] = Seq(EnrichmentInstruction.TweetEnrichment) + ): EnrichmentStage = { + EnrichmentStage(status, stageType, instructions) + } + + trait ExecutionCount { + val callMap: mutable.Map[Int, (EnrichmentInstruction, EnrichmentEnvelop)] = + mutable.Map[Int, (EnrichmentInstruction, EnrichmentEnvelop)]() + + def recordExecution(instruction: EnrichmentInstruction, envelop: EnrichmentEnvelop): Unit = { + ExecutionContext.executionCount = ExecutionContext.executionCount + 1 + callMap.put(ExecutionContext.executionCount, (instruction, envelop)) + } + } + + class MockHydrator extends Hydrator with ExecutionCount { + def hydrate( + instruction: EnrichmentInstruction, + key: Option[EnrichmentKey], + envelop: EnrichmentEnvelop + ): Future[EnrichmentEnvelop] = { + recordExecution(instruction, envelop) + Future(envelop.copy(envelopId = ExecutionContext.executionCount)) + } + } + + class MockPartitioner extends Partitioner with ExecutionCount { + def repartition( + instruction: EnrichmentInstruction, + envelop: EnrichmentEnvelop + ): Option[EnrichmentKey] = { + recordExecution(instruction, envelop) + Some(EnrichmentKey(EnrichmentIdType.TweetId, ExecutionContext.executionCount)) + } + } + } + + test("single partitioning plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + // given a simple plan that only repartition the input and nothing else + val plan = mkPlan(repartitionTweet) + + (1L to 10).foreach(id => { + val envelop = EnrichmentEnvelop(id, tweet1, plan) + + // when + val actual = Await.result(driver.execute(Some(key1), Future(envelop))) + + val expectedKey = Some(key1.copy(id = id)) + val expectedValue = + envelop.copy(plan = mkPlan(complete(repartitionTweet, Some(partitionTopic)))) + + // then the result should have a new partitioned key, with the envelop unchanged except the plan is complete + // however, the output topic is the partitionTopic (since this is only a partitioning stage) + assert((expectedKey, expectedValue) == actual) + }) + } + } + + test("multi-stage partitioning plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + // given a plan that chain multiple repartition stages together + val plan = mkPlan(repartitionTweet, repartitionNotiTweet) + val envelop1 = EnrichmentEnvelop(1L, tweet1, plan) + + // when 1st partitioning trip + val actual1 = Await.result(driver.execute(Some(key1), Future(envelop1))) + + // then the result should have a new partitioned key, with the envelop unchanged except the + // 1st stage of the plan is complete + val expectedKey1 = key1.copy(id = 1L) + val expectedValue1 = + envelop1.copy(plan = + mkPlan(complete(repartitionTweet, Some(partitionTopic)), repartitionNotiTweet)) + + assert((Some(expectedKey1), expectedValue1) == actual1) + + // then, we reuse the last result to exercise the logics on the driver again for the 2st trip + val actual2 = Await.result(driver.execute(Some(expectedKey1), Future(expectedValue1))) + val expectedKey2 = key1.copy(id = 2L) + val expectedValue2 = + envelop1.copy(plan = mkPlan( + complete(repartitionTweet, Some(partitionTopic)), + complete(repartitionNotiTweet, Some(partitionTopic)))) + + assert((Some(expectedKey2), expectedValue2) == actual2) + } + } + + test("single hydration plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + // given a simple plan that only hydrate the input and nothing else + val plan = mkPlan(hydrateTweet) + + (1L to 10).foreach(id => { + val envelop = EnrichmentEnvelop(id, tweet1, plan) + + // when + val actual = Await.result(driver.execute(Some(key1), Future(envelop))) + + val expectedValue = + envelop.copy(envelopId = id, plan = mkPlan(complete(hydrateTweet, Some(outputTopic)))) + + // then the result should have the same key, with the envelop hydrated & the plan is complete + // the output topic should be the final topic since this is a hydration stage and the plan is complete + assert((Some(key1), expectedValue) == actual) + }) + } + } + + test("single hydration with multiple instructions plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + // given a simple plan that only hydrate the input and nothing else + val plan = mkPlan(hydrateTweetMultiInstructions) + val envelop = EnrichmentEnvelop(0L, tweet1, plan) + + // when + val actual = Await.result(driver.execute(Some(key1), Future(envelop))) + val expectedValue = envelop.copy( + envelopId = 4L, // hydrate is called 4 times for 4 instructions in 1 stage + plan = mkPlan(complete(hydrateTweetMultiInstructions, Some(outputTopic)))) + + // then the result should have the same key, with the envelop hydrated & the plan is complete + // the output topic should be the final topic since this is a hydration stage and the plan is complete + assert((Some(key1), expectedValue) == actual) + } + } + + test("multi-stage hydration plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + // given a plan that only hydrate twice + val plan = mkPlan(hydrateTweet, hydrateNotiTweet) + val envelop = EnrichmentEnvelop(1L, tweet1, plan) + + // when + val actual = Await.result(driver.execute(Some(key1), Future(envelop))) + + // then the result should have the same key, with the envelop hydrated. since there's no + // partitioning stages, the driver will just recurse until all the hydration is done, + // then output to the final topic + val expectedValue = + envelop.copy( + envelopId = 2L, + plan = mkPlan( + complete(hydrateTweet), + complete( + hydrateNotiTweet, + Some(outputTopic) + ) // only the last stage has the output topic + )) + + assert((Some(key1), expectedValue) == actual) + } + } + + test("multi-stage partition+hydration plan works") { + new Fixtures { + val driver = new EnrichmentDriver(Some(outputTopic), partitionTopic, hydrator, partitioner) + + // given a plan that repartition then hydrate twice + val plan = mkPlan(repartitionTweet, hydrateTweet, repartitionNotiTweet, hydrateNotiTweet) + var curEnvelop = EnrichmentEnvelop(1L, tweet1, plan) + var curKey = key1 + + // stage 1, partitioning on tweet should be correct + var actual = Await.result(driver.execute(Some(curKey), Future(curEnvelop))) + var expectedKey = curKey.copy(id = 1L) + var expectedValue = curEnvelop.copy( + plan = mkPlan( + complete(repartitionTweet, Some(partitionTopic)), + hydrateTweet, + repartitionNotiTweet, + hydrateNotiTweet)) + + assert((Some(expectedKey), expectedValue) == actual) + curEnvelop = actual._2 + curKey = actual._1.get + + // stage 2-3, hydrating on tweet should be correct + // and since the next stage after hydration is a repartition, it will does so correctly + actual = Await.result(driver.execute(Some(curKey), Future(curEnvelop))) + expectedKey = curKey.copy(id = 3) // repartition is done in stage 3 + expectedValue = curEnvelop.copy( + envelopId = 2L, // hydration is done in stage 2 + plan = mkPlan( + complete(repartitionTweet, Some(partitionTopic)), + complete(hydrateTweet), + complete(repartitionNotiTweet, Some(partitionTopic)), + hydrateNotiTweet) + ) + + assert((Some(expectedKey), expectedValue) == actual) + curEnvelop = actual._2 + curKey = actual._1.get + + // then finally, stage 4 would output to the final topic + actual = Await.result(driver.execute(Some(curKey), Future(curEnvelop))) + expectedKey = curKey // nothing's changed in the key + expectedValue = curEnvelop.copy( + envelopId = 4L, + plan = mkPlan( + complete(repartitionTweet, Some(partitionTopic)), + complete(hydrateTweet), + complete(repartitionNotiTweet, Some(partitionTopic)), + complete(hydrateNotiTweet, Some(outputTopic)) + ) + ) + + assert((Some(expectedKey), expectedValue) == actual) + } + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD.bazel new file mode 100644 index 000000000..39ba06b0d --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/BUILD.bazel @@ -0,0 +1,14 @@ +junit_tests( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap:dynmap-core", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap/json:dynmap-json", + "finatra/inject/inject-core/src/test/scala/com/twitter/inject", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql", + "util/util-core:scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlSpecs.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlSpecs.scala new file mode 100644 index 000000000..e7ebd27e6 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/graphql/GraphqlSpecs.scala @@ -0,0 +1,71 @@ +package com.twitter.unified_user_actions.enricher.graphql + +import com.twitter.dynmap.DynMap +import com.twitter.inject.Test +import com.twitter.util.Return +import com.twitter.util.Throw +import com.twitter.util.Try +import org.scalatest.matchers.should.Matchers + +class GraphqlSpecs extends Test with Matchers { + trait Fixtures { + val sampleError = """ + |{ + | "errors": [ + | { + | "message": "Some err msg!", + | "code": 366, + | "kind": "Validation", + | "name": "QueryViolationError", + | "source": "Client", + | "tracing": { + | "trace_id": "1234567890" + | } + | } + | ] + |}""".stripMargin + + val sampleValidRsp = + """ + |{ + | "data": { + | "tweet_result_by_rest_id": { + | "result": { + | "core": { + | "user": { + | "legacy": { + | "id_str": "12" + | } + | } + | } + | } + | } + | } + |} + |""".stripMargin + + val sampleValidRspExpected = Return( + Set(("data.tweet_result_by_rest_id.result.core.user.legacy.id_str", "12"))) + val sampleErrorExpected = Throw( + GraphqlRspErrors( + DynMap.from( + "errors" -> List( + Map( + "message" -> "Some err msg!", + "code" -> 366, + "kind" -> "Validation", + "name" -> "QueryViolationError", + "source" -> "Client", + "tracing" -> Map("trace_id" -> "1234567890") + ))))) + def toFlattened(testStr: String): Try[Set[(String, Any)]] = + GraphqlRspParser.toDynMap(testStr).map { dm => dm.valuesFlattened.toSet } + } + + test("Graphql Response Parser") { + new Fixtures { + toFlattened(sampleValidRsp) shouldBe sampleValidRspExpected + toFlattened(sampleError) shouldBe sampleErrorExpected + } + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD.bazel new file mode 100644 index 000000000..607524d25 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/BUILD.bazel @@ -0,0 +1,13 @@ +junit_tests( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/guava", + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala:test-deps", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache", + "util/util-cache-guava/src/main/scala", + "util/util-cache/src/main/scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCacheTest.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCacheTest.scala new file mode 100644 index 000000000..bcf3d5fb6 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hcache/LocalCacheTest.scala @@ -0,0 +1,153 @@ +package com.twitter.unified_user_actions.enricher.hcache + +import com.google.common.cache.Cache +import com.google.common.cache.CacheBuilder +import com.twitter.finagle.stats.InMemoryStatsReceiver +import com.twitter.inject.Test +import com.twitter.util.Await +import com.twitter.util.Future +import com.twitter.util.Time +import java.util.concurrent.TimeUnit +import java.lang.{Integer => JInt} + +class LocalCacheTest extends Test { + + trait Fixture { + val time = Time.fromMilliseconds(123456L) + val ttl = 5 + val maxSize = 10 + + val underlying: Cache[JInt, Future[JInt]] = CacheBuilder + .newBuilder() + .expireAfterWrite(ttl, TimeUnit.SECONDS) + .maximumSize(maxSize) + .build[JInt, Future[JInt]]() + + val stats = new InMemoryStatsReceiver + + val cache = new LocalCache[JInt, JInt]( + underlying = underlying, + statsReceiver = stats + ) + + def getCounts(counterName: String*): Long = stats.counter(counterName: _*)() + } + + test("simple local cache works") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + (1 to maxSize + 1).foreach { id => + cache.getOrElseUpdate(id)(Future.value(id)) + + val actual = Await.result(cache.get(id).get) + assert(actual === id) + } + assert(cache.size === maxSize) + + assert(getCounts("gets") === 2 * (maxSize + 1)) + assert(getCounts("hits") === maxSize + 1) + assert(getCounts("misses") === maxSize + 1) + assert(getCounts("sets", "evictions", "failed_futures") === 0) + + cache.reset() + assert(cache.size === 0) + } + } + } + + test("getOrElseUpdate successful futures") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + (1 to maxSize + 1).foreach { _ => + cache.getOrElseUpdate(1) { + Future.value(1) + } + } + assert(cache.size === 1) + + assert(getCounts("gets") === maxSize + 1) + assert(getCounts("hits") === maxSize) + assert(getCounts("misses") === 1) + assert(getCounts("sets", "evictions", "failed_futures") === 0) + + cache.reset() + assert(cache.size === 0) + } + } + } + + test("getOrElseUpdate Failed Futures") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + (1 to maxSize + 1).foreach { id => + cache.getOrElseUpdate(id)(Future.exception(new IllegalArgumentException(""))) + assert(cache.get(id).map { + Await.result(_) + } === None) + } + assert(cache.size === 0) + + assert(getCounts("gets") === 2 * (maxSize + 1)) + assert(getCounts("hits", "misses", "sets") === 0) + assert(getCounts("evictions") === maxSize + 1) + assert(getCounts("failed_futures") === maxSize + 1) + } + } + } + + test("Set successful Future") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + cache.set(1, Future.value(2)) + assert(Await.result(cache.get(1).get) === 2) + assert(getCounts("gets") === 1) + assert(getCounts("hits") === 1) + assert(getCounts("misses") === 0) + assert(getCounts("sets") === 1) + assert(getCounts("evictions", "failed_futures") === 0) + } + } + } + + test("Evict") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + // need to use reference here!!! + val f1 = Future.value(int2Integer(1)) + val f2 = Future.value(int2Integer(2)) + cache.set(1, f2) + cache.evict(1, f1) + cache.evict(1, f2) + assert(getCounts("gets", "hits", "misses") === 0) + assert(getCounts("sets") === 1) + assert(getCounts("evictions") === 1) // not 2 + assert(getCounts("failed_futures") === 0) + } + } + } + + test("Set Failed Futures") { + new Fixture { + Time.withTimeAt(time) { _ => + assert(cache.size === 0) + + cache.set(1, Future.exception(new IllegalArgumentException(""))) + assert(cache.size === 0) + + assert(getCounts("gets", "hits", "misses", "sets") === 0) + assert(getCounts("evictions") === 1) + assert(getCounts("failed_futures") === 1) + } + } + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD.bazel new file mode 100644 index 000000000..1ff01e4c5 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/BUILD.bazel @@ -0,0 +1,19 @@ +junit_tests( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/mockito:mockito-core", + "3rdparty/jvm/org/mockito:mockito-scala", + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap:dynmap-core", + "finatra/inject/inject-core/src/test/scala/com/twitter/inject", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:default", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:noop", + "unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher:fixture", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydratorTest.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydratorTest.scala new file mode 100644 index 000000000..1e4477318 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/DefaultHydratorTest.scala @@ -0,0 +1,118 @@ +package com.twitter.unified_user_actions.enricher.hydrator + +import com.google.common.cache.CacheBuilder +import com.twitter.dynmap.DynMap +import com.twitter.graphql.thriftscala.GraphQlRequest +import com.twitter.graphql.thriftscala.GraphQlResponse +import com.twitter.graphql.thriftscala.GraphqlExecutionService +import com.twitter.inject.Test +import com.twitter.unified_user_actions.enricher.EnricherFixture +import com.twitter.unified_user_actions.enricher.FatalException +import com.twitter.unified_user_actions.enricher.hcache.LocalCache +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.thriftscala.AuthorInfo +import com.twitter.util.Await +import com.twitter.util.Future +import org.mockito.ArgumentMatchers +import org.mockito.MockitoSugar + +class DefaultHydratorTest extends Test with MockitoSugar { + + trait Fixtures extends EnricherFixture { + val cache = new LocalCache[EnrichmentKey, DynMap]( + underlying = CacheBuilder + .newBuilder() + .maximumSize(10) + .build[EnrichmentKey, Future[DynMap]]()) + + val client = mock[GraphqlExecutionService.FinagledClient] + val key = EnrichmentKey(EnrichmentIdType.TweetId, 1L) + val envelop = EnrichmentEnvelop(123L, mkUUATweetEvent(1L), tweetInfoEnrichmentPlan) + + def mkGraphQLResponse(authorId: Long): GraphQlResponse = + GraphQlResponse( + Some( + s""" + |{ + | "data": { + | "tweet_result_by_rest_id": { + | "result": { + | "core": { + | "user": { + | "legacy": { + | "id_str": "$authorId" + | } + | } + | } + | } + | } + | } + |} + |""".stripMargin + )) + } + + test("non-fatal errors should proceed as normal") { + new Fixtures { + val hydrator = new DefaultHydrator(cache, client) + + // when graphql client encounter any exception + when(client.graphql(ArgumentMatchers.any[GraphQlRequest])) + .thenReturn(Future.exception(new IllegalStateException("any exception"))) + + val actual = + Await.result(hydrator.hydrate(EnrichmentInstruction.TweetEnrichment, Some(key), envelop)) + + // then the original envelop is expected + assert(envelop == actual) + } + } + + test("fatal errors should return a future exception") { + new Fixtures { + val hydrator = new DefaultHydrator(cache, client) + + // when graphql client encounter a fatal exception + when(client.graphql(ArgumentMatchers.any[GraphQlRequest])) + .thenReturn(Future.exception(new FatalException("fatal exception") {})) + + val actual = hydrator.hydrate(EnrichmentInstruction.TweetEnrichment, Some(key), envelop) + + // then a failed future is expected + assertFailedFuture[FatalException](actual) + } + } + + test("author_id should be hydrated from graphql respond") { + new Fixtures { + val hydrator = new DefaultHydrator(cache, client) + + when(client.graphql(ArgumentMatchers.any[GraphQlRequest])) + .thenReturn(Future.value(mkGraphQLResponse(888L))) + + val actual = hydrator.hydrate(EnrichmentInstruction.TweetEnrichment, Some(key), envelop) + + assertFutureValue( + actual, + envelop.copy(uua = mkUUATweetEvent(1L, Some(AuthorInfo(Some(888L)))))) + } + } + + test("when AuthorInfo is populated, there should be no hydration") { + new Fixtures { + val hydrator = new DefaultHydrator(cache, client) + + when(client.graphql(ArgumentMatchers.any[GraphQlRequest])) + .thenReturn(Future.value(mkGraphQLResponse(333L))) + + val expected = envelop.copy(uua = + mkUUATweetEvent(tweetId = 3L, author = Some(AuthorInfo(authorId = Some(222))))) + val actual = hydrator.hydrate(EnrichmentInstruction.TweetEnrichment, Some(key), expected) + + assertFutureValue(actual, expected) + } + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydratorTest.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydratorTest.scala new file mode 100644 index 000000000..79c7af790 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/hydrator/NoopHydratorTest.scala @@ -0,0 +1,12 @@ +package com.twitter.unified_user_actions.enricher.hydrator + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.enricher.ImplementationException + +class NoopHydratorTest extends Test { + test("noop hydrator should throw an error when used") { + assertThrows[ImplementationException] { + new NoopHydrator().hydrate(null, null, null) + } + } +} diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD.bazel b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD.bazel new file mode 100644 index 000000000..ab9678af4 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/BUILD.bazel @@ -0,0 +1,13 @@ +junit_tests( + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala/com/twitter/inject", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner:default", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher:fixture", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) diff --git a/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitionerTest.scala b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitionerTest.scala new file mode 100644 index 000000000..7b8f59cb4 --- /dev/null +++ b/unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher/partitioner/DefaultPartitionerTest.scala @@ -0,0 +1,83 @@ +package com.twitter.unified_user_actions.enricher.partitioner + +import com.twitter.inject.Test +import com.twitter.unified_user_actions.enricher.EnricherFixture +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.NotificationTweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.TweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.partitioner.DefaultPartitioner.NullKey +import org.scalatest.prop.TableDrivenPropertyChecks + +class DefaultPartitionerTest extends Test with TableDrivenPropertyChecks { + test("default partitioner should work") { + new EnricherFixture { + val partitioner = new DefaultPartitioner + + val instructions = Table( + ("instruction", "envelop", "expected"), + // tweet info + ( + TweetEnrichment, + EnrichmentEnvelop(1L, mkUUATweetEvent(123L), tweetInfoEnrichmentPlan), + Some(EnrichmentKey(EnrichmentIdType.TweetId, 123L))), + // notification tweet info + ( + NotificationTweetEnrichment, + EnrichmentEnvelop(2L, mkUUATweetNotificationEvent(234L), tweetNotificationEnrichmentPlan), + Some(EnrichmentKey(EnrichmentIdType.TweetId, 234L))), + // notification with multiple tweet info + ( + NotificationTweetEnrichment, + EnrichmentEnvelop( + 3L, + mkUUAMultiTweetNotificationEvent(22L, 33L), + tweetNotificationEnrichmentPlan), + Some(EnrichmentKey(EnrichmentIdType.TweetId, 22L)) + ) // only the first tweet id is partitioned + ) + + forEvery(instructions) { + ( + instruction: EnrichmentInstruction, + envelop: EnrichmentEnvelop, + expected: Some[EnrichmentKey] + ) => + val actual = partitioner.repartition(instruction, envelop) + assert(expected === actual) + } + } + } + + test("unsupported events shouldn't be partitioned") { + new EnricherFixture { + val partitioner = new DefaultPartitioner + + val instructions = Table( + ("instruction", "envelop", "expected"), + // profile uua event + ( + TweetEnrichment, + EnrichmentEnvelop(1L, mkUUAProfileEvent(111L), tweetInfoEnrichmentPlan), + NullKey), + // unknown notification (not a tweet) + ( + NotificationTweetEnrichment, + EnrichmentEnvelop(1L, mkUUATweetNotificationUnknownEvent(), tweetInfoEnrichmentPlan), + NullKey), + ) + + forEvery(instructions) { + ( + instruction: EnrichmentInstruction, + envelop: EnrichmentEnvelop, + expected: Option[EnrichmentKey] + ) => + val actual = partitioner.repartition(instruction, envelop) + assert(expected === actual) + } + } + } +} diff --git a/unified_user_actions/graphql/README.md b/unified_user_actions/graphql/README.md new file mode 100644 index 000000000..fbbf8006f --- /dev/null +++ b/unified_user_actions/graphql/README.md @@ -0,0 +1,15 @@ +Documents +========= + +TweetHydration +-------------- + +Upload +------ + +``` +$ graphql stored_document put unified_user_actions/graphql/TweetHydration.graphql +``` + +DocumentId: `M5sHxua-RDiRtTn48CAhng` +Test: https://graphql.twitter.com/snaptest/tests/1580340324727017472/ diff --git a/unified_user_actions/graphql/TweetHydration.graphql b/unified_user_actions/graphql/TweetHydration.graphql new file mode 100644 index 000000000..604019d69 --- /dev/null +++ b/unified_user_actions/graphql/TweetHydration.graphql @@ -0,0 +1,15 @@ +query TweetHydration($rest_id: NumericString!) { + tweet_result_by_rest_id(rest_id: $rest_id, safety_level: ForDevelopmentOnly) { + result { + ... on Tweet { + core { + user { + legacy { + id_str + } + } + } + } + } + } +} diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/BUILD b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/BUILD new file mode 100644 index 000000000..1698385eb --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/BUILD @@ -0,0 +1,22 @@ +scala_library( + sources = ["**/*.scala"], + compiler_option_sets = ["fatal_warnings"], + strict_deps = True, + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/thrift:libthrift", + "kafka/finagle-kafka/finatra-kafka/src/main/java", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/headers", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "scrooge/scrooge-core/src/main/scala", + "scrooge/scrooge-serializer/src/main/scala", + "util/util-app/src/main/scala", + "util/util-core:util-core-util", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala/com/twitter/util/logging", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + "util/util-thrift", + ], +) diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientConfigs.scala b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientConfigs.scala new file mode 100644 index 000000000..ffbe8f126 --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientConfigs.scala @@ -0,0 +1,211 @@ +package com.twitter.unified_user_actions.kafka + +import com.twitter.conversions.DurationOps._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import org.apache.kafka.common.record.CompressionType + +object ClientConfigs { + final val kafkaBootstrapServerConfig = "kafka.bootstrap.servers" + final val kafkaBootstrapServerHelp: String = + """Kafka servers list. It is usually a WilyNs name at Twitter + """.stripMargin + + final val kafkaBootstrapServerRemoteDestConfig = "kafka.bootstrap.servers.remote.dest" + final val kafkaBootstrapServerRemoteDestHelp: String = + """Destination Kafka servers, if the sink cluster is different from the source cluster, + |i.e., read from one cluster and output to another cluster + """.stripMargin + + final val kafkaApplicationIdConfig = "kafka.application.id" + final val kafkaApplicationIdHelp: String = + """An identifier for the Kafka application. Must be unique within the Kafka cluster + """.stripMargin + + // Processor in general + final val enableTrustStore = "kafka.trust.store.enable" + final val enableTrustStoreDefault = true + final val enableTrustStoreHelp = "Whether to enable trust store location" + + final val trustStoreLocationConfig = "kafka.trust.store.location" + final val trustStoreLocationDefault = "/etc/tw_truststore/messaging/kafka/client.truststore.jks" + final val trustStoreLocationHelp = "trust store location" + + final val kafkaMaxPendingRequestsConfig = "kafka.max.pending.requests" + final val kafkaMaxPendingRequestsHelp = "the maximum number of concurrent pending requests." + + final val kafkaWorkerThreadsConfig = "kafka.worker.threads" + final val kafkaWorkerThreadsHelp = + """This has meaning that is dependent on the value of {@link usePerPartitionThreadPool} - + | if that is false, this is the number of parallel worker threads that will execute the processor function. + | if that is true, this is the number of parallel worker threads for each partition. So the total number of + | threads will be {@link workerThreads} * number_of_partitions. + |""".stripMargin + + final val retriesConfig = "kafka.retries" + final val retriesDefault = 300 + final val retriesHelp: String = + """Setting a value greater than zero will cause the client to resend any request that fails + |with a potentially transient error + """.stripMargin + + final val retryBackoffConfig = "kafka.retry.backoff" + final val retryBackoffDefault: Duration = 1.seconds + final val retryBackoffHelp: String = + """The amount of time to wait before attempting to retry a failed request to a given topic + |partition. This avoids repeatedly sending requests in a tight loop under some failure + |scenarios + """.stripMargin + + // Kafka Producer + final val producerClientIdConfig = "kafka.producer.client.id" + final val producerClientIdHelp: String = + """The client id of the Kafka producer, required for producers. + """.stripMargin + + final val producerIdempotenceConfig = "kafka.producer.idempotence" + final val producerIdempotenceDefault: Boolean = false + final val producerIdempotenceHelp: String = + """"retries due to broker failures, etc., may write duplicates of the retried message in the + stream. Note that enabling idempotence requires + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION to be less than or equal to 5, + RETRIES_CONFIG to be greater than 0 and ACKS_CONFIG + must be 'all'. If these values are not explicitly set by the user, suitable values will be + chosen. If incompatible values are set, a ConfigException will be thrown. + """.stripMargin + + final val producerBatchSizeConfig = "kafka.producer.batch.size" + final val producerBatchSizeDefault: StorageUnit = 512.kilobytes + final val producerBatchSizeHelp: String = + """The producer will attempt to batch records together into fewer requests whenever multiple + |records are being sent to the same partition. This helps performance on both the client and + |the server. This configuration controls the default batch size in bytes. + |No attempt will be made to batch records larger than this size. + |Requests sent to brokers will contain multiple batches, one for each partition with data + |available to be sent. A small batch size will make batching less common and may reduce + |throughput (a batch size of zero will disable batching entirely). + |A very large batch size may use memory a bit more wastefully as we will always allocate a + |buffer of the specified batch size in anticipation of additional records. + """.stripMargin + + final val producerBufferMemConfig = "kafka.producer.buffer.mem" + final val producerBufferMemDefault: StorageUnit = 256.megabytes + final val producerBufferMemHelp: String = + """The total bytes of memory the producer can use to buffer records waiting to be sent to the + |server. If records are sent faster than they can be delivered to the server the producer + |will block for MAX_BLOCK_MS_CONFIG after which it will throw an exception. + |This setting should correspond roughly to the total memory the producer will use, but is not + |a hard bound since not all memory the producer uses is used for buffering. + |Some additional memory will be used for compression (if compression is enabled) as well as + |for maintaining in-flight requests. + """.stripMargin + + final val producerLingerConfig = "kafka.producer.linger" + final val producerLingerDefault: Duration = 100.milliseconds + final val producerLingerHelp: String = + """The producer groups together any records that arrive in between request transmissions into + |a single batched request. "Normally this occurs only under load when records arrive faster + |than they can be sent out. However in some circumstances the client may want to reduce the + |number of requests even under moderate load. This setting accomplishes this by adding a + |small amount of artificial delay—that is, rather than immediately sending out a record + |the producer will wait for up to the given delay to allow other records to be sent so that + |the sends can be batched together. This can be thought of as analogous to Nagle's algorithm + |in TCP. This setting gives the upper bound on the delay for batching: once we get + |BATCH_SIZE_CONFIG worth of records for a partition it will be sent immediately regardless + |of this setting, however if we have fewer than this many bytes accumulated for this + |partition we will 'linger' for the specified time waiting for more records to show up. + |This setting defaults to 0 (i.e. no delay). Setting LINGER_MS_CONFIG=5, for example, + |would have the effect of reducing the number of requests sent but would add up to 5ms of + |latency to records sent in the absence of load. + """.stripMargin + + final val producerRequestTimeoutConfig = "kafka.producer.request.timeout" + final val producerRequestTimeoutDefault: Duration = 30.seconds + final val producerRequestTimeoutHelp: String = + """"The configuration controls the maximum amount of time the client will wait + |for the response of a request. If the response is not received before the timeout + |elapses the client will resend the request if necessary or fail the request if + |retries are exhausted. + """.stripMargin + + final val compressionConfig = "kafka.producer.compression.type" + final val compressionDefault: CompressionTypeFlag = CompressionTypeFlag(CompressionType.NONE) + final val compressionHelp = "Producer compression type" + + // Kafka Consumer + final val kafkaGroupIdConfig = "kafka.group.id" + final val kafkaGroupIdHelp: String = + """The group identifier for the Kafka consumer + """.stripMargin + + final val kafkaCommitIntervalConfig = "kafka.commit.interval" + final val kafkaCommitIntervalDefault: Duration = 10.seconds + final val kafkaCommitIntervalHelp: String = + """The frequency with which to save the position of the processor. + """.stripMargin + + final val consumerMaxPollRecordsConfig = "kafka.max.poll.records" + final val consumerMaxPollRecordsDefault: Int = 1000 + final val consumerMaxPollRecordsHelp: String = + """The maximum number of records returned in a single call to poll() + """.stripMargin + + final val consumerMaxPollIntervalConfig = "kafka.max.poll.interval" + final val consumerMaxPollIntervalDefault: Duration = 5.minutes + final val consumerMaxPollIntervalHelp: String = + """The maximum delay between invocations of poll() when using consumer group management. + This places an upper bound on the amount of time that the consumer can be idle before fetching more records. + If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group + will rebalance in order to reassign the partitions to another member. + """.stripMargin + + final val consumerSessionTimeoutConfig = "kafka.session.timeout" + final val consumerSessionTimeoutDefault: Duration = 1.minute + final val consumerSessionTimeoutHelp: String = + """The timeout used to detect client failures when using Kafka's group management facility. + The client sends periodic heartbeats to indicate its liveness to the broker. + If no heartbeats are received by the broker before the expiration of this session timeout, then the broker + will remove this client from the group and initiate a rebalance. Note that the value must be in the allowable + range as configured in the broker configuration by group.min.session.timeout.ms and group.max.session.timeout.ms. + """.stripMargin + + final val consumerFetchMinConfig = "kafka.consumer.fetch.min" + final val consumerFetchMinDefault: StorageUnit = 1.kilobyte + final val consumerFetchMinHelp: String = + """The minimum amount of data the server should return for a fetch request. If insufficient + |data is available the request will wait for that much data to accumulate before answering + |the request. The default setting of 1 byte means that fetch requests are answered as soon + |as a single byte of data is available or the fetch request times out waiting for data to + |arrive. Setting this to something greater than 1 will cause the server to wait for larger + |amounts of data to accumulate which can improve server throughput a bit at the cost of + |some additional latency. + """.stripMargin + + final val consumerFetchMaxConfig = "kafka.consumer.fetch.max" + final val consumerFetchMaxDefault: StorageUnit = 1.megabytes + final val consumerFetchMaxHelp: String = + """The maximum amount of data the server should return for a fetch request. Records are + |fetched in batches by the consumer, and if the first record batch in the first non-empty + |partition of the fetch is larger than this value, the record batch will still be returned + |to ensure that the consumer can make progress. As such, this is not a absolute maximum. + |The maximum record batch size accepted by the broker is defined via message.max.bytes + |(broker config) or max.message.bytes (topic config). + |Note that the consumer performs multiple fetches in parallel. + """.stripMargin + + final val consumerReceiveBufferSizeConfig = "kafka.consumer.receive.buffer.size" + final val consumerReceiveBufferSizeDefault: StorageUnit = 1.megabytes + final val consumerReceiveBufferSizeHelp: String = + """The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. + |If the value is -1, the OS default will be used. + """.stripMargin + + final val consumerApiTimeoutConfig = "kafka.consumer.api.timeout" + final val consumerApiTimeoutDefault: Duration = 120.seconds + final val consumerApiTimeoutHelp: String = + """Specifies the timeout (in milliseconds) for consumer APIs that could block. + |This configuration is used as the default timeout for all consumer operations that do + |not explicitly accept a timeout parameter."; + """.stripMargin +} diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientProviders.scala b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientProviders.scala new file mode 100644 index 000000000..b7fcac3c7 --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/ClientProviders.scala @@ -0,0 +1,141 @@ +package com.twitter.unified_user_actions.kafka + +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.finatra.kafka.consumers.FinagleKafkaConsumerBuilder +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.producers.BlockingFinagleKafkaProducer +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerBuilder +import com.twitter.kafka.client.processor.ThreadSafeKafkaConsumerClient +import com.twitter.util.logging.Logging +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import org.apache.kafka.clients.CommonClientConfigs +import org.apache.kafka.clients.producer.ProducerConfig +import org.apache.kafka.common.config.SaslConfigs +import org.apache.kafka.common.config.SslConfigs +import org.apache.kafka.common.record.CompressionType +import org.apache.kafka.common.security.auth.SecurityProtocol +import org.apache.kafka.common.serialization.Deserializer +import org.apache.kafka.common.serialization.Serializer + +/** + * A Utility class mainly provides raw Kafka producer/consumer supports + */ +object ClientProviders extends Logging { + + /** + * Provide a Finagle-thread-safe-and-compatible Kafka consumer. + * For the params and their significance, please see [[ClientConfigs]] + */ + def mkConsumer[CK, CV]( + bootstrapServer: String, + keySerde: Deserializer[CK], + valueSerde: Deserializer[CV], + groupId: String, + autoCommit: Boolean = false, + maxPollRecords: Int = ClientConfigs.consumerMaxPollRecordsDefault, + maxPollInterval: Duration = ClientConfigs.consumerMaxPollIntervalDefault, + autoCommitInterval: Duration = ClientConfigs.kafkaCommitIntervalDefault, + sessionTimeout: Duration = ClientConfigs.consumerSessionTimeoutDefault, + fetchMax: StorageUnit = ClientConfigs.consumerFetchMaxDefault, + fetchMin: StorageUnit = ClientConfigs.consumerFetchMinDefault, + receiveBuffer: StorageUnit = ClientConfigs.consumerReceiveBufferSizeDefault, + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault) + ): ThreadSafeKafkaConsumerClient[CK, CV] = { + val baseBuilder = + FinagleKafkaConsumerBuilder[CK, CV]() + .keyDeserializer(keySerde) + .valueDeserializer(valueSerde) + .dest(bootstrapServer) + .groupId(KafkaGroupId(groupId)) + .enableAutoCommit(autoCommit) + .maxPollRecords(maxPollRecords) + .maxPollInterval(maxPollInterval) + .autoCommitInterval(autoCommitInterval) + .receiveBuffer(receiveBuffer) + .sessionTimeout(sessionTimeout) + .fetchMax(fetchMax) + .fetchMin(fetchMin) + .withConfig( + CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, + SecurityProtocol.PLAINTEXT.toString) + + trustStoreLocationOpt + .map { trustStoreLocation => + new ThreadSafeKafkaConsumerClient[CK, CV]( + baseBuilder + .withConfig( + CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, + SecurityProtocol.SASL_SSL.toString) + .withConfig(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, trustStoreLocation) + .withConfig(SaslConfigs.SASL_MECHANISM, SaslConfigs.GSSAPI_MECHANISM) + .withConfig(SaslConfigs.SASL_KERBEROS_SERVICE_NAME, "kafka") + .withConfig(SaslConfigs.SASL_KERBEROS_SERVER_NAME, "kafka") + .config) + }.getOrElse { + new ThreadSafeKafkaConsumerClient[CK, CV]( + baseBuilder + .withConfig( + CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, + SecurityProtocol.PLAINTEXT.toString) + .config) + } + } + + /** + * Provide a Finagle-compatible Kafka producer. + * For the params and their significance, please see [[ClientConfigs]] + */ + def mkProducer[PK, PV]( + bootstrapServer: String, + keySerde: Serializer[PK], + valueSerde: Serializer[PV], + clientId: String, + idempotence: Boolean = ClientConfigs.producerIdempotenceDefault, + batchSize: StorageUnit = ClientConfigs.producerBatchSizeDefault, + linger: Duration = ClientConfigs.producerLingerDefault, + bufferMem: StorageUnit = ClientConfigs.producerBufferMemDefault, + compressionType: CompressionType = ClientConfigs.compressionDefault.compressionType, + retries: Int = ClientConfigs.retriesDefault, + retryBackoff: Duration = ClientConfigs.retryBackoffDefault, + requestTimeout: Duration = ClientConfigs.producerRequestTimeoutDefault, + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault) + ): BlockingFinagleKafkaProducer[PK, PV] = { + val baseBuilder = FinagleKafkaProducerBuilder[PK, PV]() + .keySerializer(keySerde) + .valueSerializer(valueSerde) + .dest(bootstrapServer) + .clientId(clientId) + .batchSize(batchSize) + .linger(linger) + .bufferMemorySize(bufferMem) + .maxRequestSize(4.megabytes) + .compressionType(compressionType) + .enableIdempotence(idempotence) + .ackMode(AckMode.ALL) + .maxInFlightRequestsPerConnection(5) + .retries(retries) + .retryBackoff(retryBackoff) + .requestTimeout(requestTimeout) + .withConfig(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, requestTimeout + linger) + trustStoreLocationOpt + .map { trustStoreLocation => + baseBuilder + .withConfig( + CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, + SecurityProtocol.SASL_SSL.toString) + .withConfig(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, trustStoreLocation) + .withConfig(SaslConfigs.SASL_MECHANISM, SaslConfigs.GSSAPI_MECHANISM) + .withConfig(SaslConfigs.SASL_KERBEROS_SERVICE_NAME, "kafka") + .withConfig(SaslConfigs.SASL_KERBEROS_SERVER_NAME, "kafka") + .build() + }.getOrElse { + baseBuilder + .withConfig( + CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, + SecurityProtocol.PLAINTEXT.toString) + .build() + } + } +} diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/CompressionTypeFlag.scala b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/CompressionTypeFlag.scala new file mode 100644 index 000000000..43f2bdb57 --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/CompressionTypeFlag.scala @@ -0,0 +1,20 @@ +package com.twitter.unified_user_actions.kafka + +import com.twitter.app.Flaggable +import org.apache.kafka.common.record.CompressionType + +case class CompressionTypeFlag(compressionType: CompressionType) + +object CompressionTypeFlag { + + def fromString(s: String): CompressionType = s.toLowerCase match { + case "lz4" => CompressionType.LZ4 + case "snappy" => CompressionType.SNAPPY + case "gzip" => CompressionType.GZIP + case "zstd" => CompressionType.ZSTD + case _ => CompressionType.NONE + } + + implicit val flaggable: Flaggable[CompressionTypeFlag] = + Flaggable.mandatory(s => CompressionTypeFlag(fromString(s))) +} diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdes.scala b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdes.scala new file mode 100644 index 000000000..511a30386 --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdes.scala @@ -0,0 +1,52 @@ +/* + * Copyright (c) 2016 Fred Cecilia, Valentin Kasas, Olivier Girardot + * + * Permission is hereby granted, free of charge, to any person obtaining a copy of + * this software and associated documentation files (the "Software"), to deal in + * the Software without restriction, including without limitation the rights to + * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of + * the Software, and to permit persons to whom the Software is furnished to do so, + * subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS + * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR + * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER + * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + +//Derived from: https://github.com/aseigneurin/kafka-streams-scala +package com.twitter.unified_user_actions.kafka.serde + +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finatra.kafka.serde.internal._ + +import com.twitter.unified_user_actions.kafka.serde.internal._ +import com.twitter.scrooge.ThriftStruct + +/** + * NullableScalaSerdes is pretty much the same as com.twitter.finatra.kafka.serde.ScalaSerdes + * The only difference is that for the deserializer it returns null instead of throwing exceptions. + * The caller can also provide a counter so that the number of corrupt/bad records can be counted. + */ +object NullableScalaSerdes { + + def Thrift[T <: ThriftStruct: Manifest]( + nullCounter: Counter = NullStatsReceiver.NullCounter + ): ThriftSerDe[T] = new ThriftSerDe[T](nullCounter = nullCounter) + + def CompactThrift[T <: ThriftStruct: Manifest]( + nullCounter: Counter = NullStatsReceiver.NullCounter + ): CompactThriftSerDe[T] = new CompactThriftSerDe[T](nullCounter = nullCounter) + + val Int = IntSerde + + val Long = LongSerde + + val Double = DoubleSerde +} diff --git a/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/internal/thrift.scala b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/internal/thrift.scala new file mode 100644 index 000000000..167b2c8f0 --- /dev/null +++ b/unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka/serde/internal/thrift.scala @@ -0,0 +1,121 @@ +/** + * Copyright 2021 Twitter, Inc. + * SPDX-License-Identifier: Apache-2.0 + */ +package com.twitter.unified_user_actions.kafka.serde.internal + +import com.google.common.util.concurrent.RateLimiter +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.NullStatsReceiver +import java.util +import com.twitter.scrooge.CompactThriftSerializer +import com.twitter.scrooge.ThriftStruct +import com.twitter.scrooge.ThriftStructCodec +import com.twitter.scrooge.ThriftStructSerializer +import org.apache.kafka.common.serialization.Deserializer +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.common.serialization.Serializer +import com.twitter.util.logging.Logging +import org.apache.thrift.protocol.TBinaryProtocol + +abstract class AbstractScroogeSerDe[T <: ThriftStruct: Manifest](nullCounter: Counter) + extends Serde[T] + with Logging { + + private val rateLimiter = RateLimiter.create(1.0) // at most 1 log message per second + + private def rateLimitedLogError(e: Exception): Unit = + if (rateLimiter.tryAcquire()) { + logger.error(e.getMessage, e) + } + + private[kafka] val thriftStructSerializer: ThriftStructSerializer[T] = { + val clazz = manifest.runtimeClass.asInstanceOf[Class[T]] + val codec = ThriftStructCodec.forStructClass(clazz) + + constructThriftStructSerializer(clazz, codec) + } + + private val _deserializer = new Deserializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def deserialize(topic: String, data: Array[Byte]): T = { + if (data == null) { + null.asInstanceOf[T] + } else { + try { + thriftStructSerializer.fromBytes(data) + } catch { + case e: Exception => + nullCounter.incr() + rateLimitedLogError(e) + null.asInstanceOf[T] + } + } + } + } + + private val _serializer = new Serializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def serialize(topic: String, data: T): Array[Byte] = { + if (data == null) { + null + } else { + thriftStructSerializer.toBytes(data) + } + } + + override def close(): Unit = {} + } + + /* Public */ + + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def deserializer: Deserializer[T] = { + _deserializer + } + + override def serializer: Serializer[T] = { + _serializer + } + + /** + * Subclasses should implement this method and provide a concrete ThriftStructSerializer + */ + protected[this] def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] +} + +class ThriftSerDe[T <: ThriftStruct: Manifest](nullCounter: Counter = NullStatsReceiver.NullCounter) + extends AbstractScroogeSerDe[T](nullCounter = nullCounter) { + protected[this] override def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] = { + new ThriftStructSerializer[T] { + override val protocolFactory = new TBinaryProtocol.Factory + override def codec: ThriftStructCodec[T] = thriftStructCodec + } + } +} + +class CompactThriftSerDe[T <: ThriftStruct: Manifest]( + nullCounter: Counter = NullStatsReceiver.NullCounter) + extends AbstractScroogeSerDe[T](nullCounter = nullCounter) { + override protected[this] def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] = { + new CompactThriftSerializer[T] { + override def codec: ThriftStructCodec[T] = thriftStructCodec + } + } +} diff --git a/unified_user_actions/kafka/src/test/resources/BUILD.bazel b/unified_user_actions/kafka/src/test/resources/BUILD.bazel new file mode 100644 index 000000000..515a45887 --- /dev/null +++ b/unified_user_actions/kafka/src/test/resources/BUILD.bazel @@ -0,0 +1,4 @@ +resources( + sources = ["*.xml"], + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/kafka/src/test/resources/logback-test.xml b/unified_user_actions/kafka/src/test/resources/logback-test.xml new file mode 100644 index 000000000..3544a0909 --- /dev/null +++ b/unified_user_actions/kafka/src/test/resources/logback-test.xml @@ -0,0 +1,29 @@ + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n + + + + + + + + + + + + + + + + + + + + + + diff --git a/unified_user_actions/kafka/src/test/scala/BUILD.bazel b/unified_user_actions/kafka/src/test/scala/BUILD.bazel new file mode 100644 index 000000000..3ae26e5a9 --- /dev/null +++ b/unified_user_actions/kafka/src/test/scala/BUILD.bazel @@ -0,0 +1,15 @@ +junit_tests( + sources = ["**/*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "3rdparty/jvm/junit", + "3rdparty/jvm/org/scalatest", + "3rdparty/jvm/org/scalatestplus:junit", + "finatra/inject/inject-core/src/test/scala:test-deps", + "kafka/finagle-kafka/finatra-kafka/src/test/scala:test-deps", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/kafka/src/test/resources", + "unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions:unified_user_actions_spec-scala", + ], +) diff --git a/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdesSpec.scala b/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdesSpec.scala new file mode 100644 index 000000000..3c8d9d793 --- /dev/null +++ b/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/NullableScalaSerdesSpec.scala @@ -0,0 +1,32 @@ +package com.twitter.unified_user_actions.kafka.serde + +import com.twitter.finagle.stats.InMemoryStatsReceiver +import com.twitter.inject.Test +import com.twitter.unified_user_actions.thriftscala._ + +class NullableScalaSerdesSpec extends Test { + val counter = (new InMemoryStatsReceiver).counter("nullCounts") + val nullableDeserializer = NullableScalaSerdes.Thrift[UnifiedUserActionSpec](counter).deserializer + val serializer = NullableScalaSerdes.Thrift[UnifiedUserActionSpec]().serializer + val uua = UnifiedUserActionSpec( + userId = 1L, + payload = Some("test"), + ) + + test("serde") { + nullableDeserializer.deserialize("", serializer.serialize("", uua)) should be(uua) + nullableDeserializer.deserialize("", "Whatever".getBytes) should be( + null.asInstanceOf[UnifiedUserActionSpec]) + counter.apply() should equal(1) + } + + test("rate limited logger when there's an exception") { + for (_ <- 1 to 10) { + nullableDeserializer.deserialize("", "Whatever".getBytes) should be( + null.asInstanceOf[UnifiedUserActionSpec]) + } + + TestLogAppender.events.size should (be(1) or be(2)) + counter.apply() should equal(11) + } +} diff --git a/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/TestLogAppender.scala b/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/TestLogAppender.scala new file mode 100644 index 000000000..454c6c14e --- /dev/null +++ b/unified_user_actions/kafka/src/test/scala/com/twitter/unified_user_actions/kafka/serde/TestLogAppender.scala @@ -0,0 +1,19 @@ +package com.twitter.unified_user_actions.kafka.serde + +import ch.qos.logback.classic.spi.ILoggingEvent +import ch.qos.logback.core.AppenderBase +import scala.collection.mutable.ArrayBuffer + +class TestLogAppender extends AppenderBase[ILoggingEvent] { + import TestLogAppender._ + + override def append(eventObject: ILoggingEvent): Unit = + recordLog(eventObject) +} + +object TestLogAppender { + val events: ArrayBuffer[ILoggingEvent] = ArrayBuffer() + + def recordLog(event: ILoggingEvent): Unit = + events += event +} diff --git a/unified_user_actions/scripts/kill_staging.sh b/unified_user_actions/scripts/kill_staging.sh new file mode 100755 index 000000000..d1376fefc --- /dev/null +++ b/unified_user_actions/scripts/kill_staging.sh @@ -0,0 +1,13 @@ +#!/bin/bash + +set -ex + +service_account="discode" +env="staging" +dcs=("pdxa") +services=("uua-tls-favs" "uua-client-event" "uua-bce" "uua-tweetypie-event" "uua-social-graph" "uua-email-notification-event" "uua-user-modification" "uua-ads-callback-engagements" "uua-favorite-archival-events" "uua-retweet-archival-events" "rekey-uua" "rekey-uua-iesource") +for dc in "${dcs[@]}"; do + for service in "${services[@]}"; do + aurora job killall --no-batch "$dc/$service_account/$env/$service" + done +done diff --git a/unified_user_actions/service/deploy/kill-staging-services.workflow b/unified_user_actions/service/deploy/kill-staging-services.workflow new file mode 100644 index 000000000..389db1bf3 --- /dev/null +++ b/unified_user_actions/service/deploy/kill-staging-services.workflow @@ -0,0 +1,46 @@ +{ + "role": "discode", + "name": "uua-kill-staging-services", + "config-files": [], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 1" + }, + "dependencies": [], + "steps": [] + }, + "targets": [ + { + "type": "script", + "name": "uua-kill-staging-services", + "keytab": "/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab", + "repository": "source", + "command": "bash unified_user_actions/scripts/kill_staging.sh", + "dependencies": [{ + "version": "latest", + "role": "aurora", + "name": "aurora" + }], + "timeout": "10.minutes" + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "unified_user_actions_dev" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "unified_user_actions_dev" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/rekey-uua-iesource-prod.workflow b/unified_user_actions/service/deploy/rekey-uua-iesource-prod.workflow new file mode 100644 index 000000000..71961c118 --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua-iesource-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "rekey-uua-iesource-prod", + "config-files": [ + "rekey-uua-iesource.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:rekey-uua-iesource" + }, + { + "type": "packer", + "name": "rekey-uua-iesource", + "artifact": "./dist/rekey-uua-iesource.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "rekey-uua-iesource-prod-atla", + "key": "atla/discode/prod/rekey-uua-iesource" + }, + { + "name": "rekey-uua-iesource-prod-pdxa", + "key": "pdxa/discode/prod/rekey-uua-iesource" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/rekey-uua-iesource-staging.workflow b/unified_user_actions/service/deploy/rekey-uua-iesource-staging.workflow new file mode 100644 index 000000000..8de5cc73a --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua-iesource-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "rekey-uua-iesource-staging", + "config-files": [ + "rekey-uua-iesource.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:rekey-uua-iesource" + }, + { + "type": "packer", + "name": "rekey-uua-iesource-staging", + "artifact": "./dist/rekey-uua-iesource.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "rekey-uua-iesource-staging-pdxa", + "key": "pdxa/discode/staging/rekey-uua-iesource" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/rekey-uua-iesource.aurora b/unified_user_actions/service/deploy/rekey-uua-iesource.aurora new file mode 100644 index 000000000..fcfd4cfd6 --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua-iesource.aurora @@ -0,0 +1,204 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'rekey-uua-iesource' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 250) + kafka_bootstrap_servers = Default(String, '/s/kafka/cdm-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'interaction_events') + sink_topics = Default(String, 'uua_keyed') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}-{{cluster}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=50.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 500, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/srv#/devel/local/kafka/ingestion-1:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'DEBUG', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) + +### pdxa right now doesn't have InteractionEvents topic +PRODUCTION_PDXA = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml', + kafka_bootstrap_servers = '/srv#/prod/atla/kafka/cdm-1:kafka-tls' +) + +STAGING_PDXA = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers = '/srv#/prod/atla/kafka/cdm-1:kafka-tls', + kafka_bootstrap_servers_remote_dest = '/srv#/devel/local/kafka/ingestion-1:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL_PDXA = STAGING( + log_level = 'DEBUG', + kafka_bootstrap_servers = '/srv#/prod/atla/kafka/cdm-1:kafka-tls' +) + +prod_job_pdxa = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION_PDXA) + +staging_job_pdxa = job_template( + environment = 'staging' +).bind(profile = STAGING_PDXA) + +devel_job_pdxa = job_template( + environment = 'devel' +).bind(profile = DEVEL_PDXA) + +jobs.append(prod_job_pdxa(cluster = 'pdxa')) +jobs.append(staging_job_pdxa(cluster = 'pdxa')) +jobs.append(devel_job_pdxa(cluster = 'pdxa')) diff --git a/unified_user_actions/service/deploy/rekey-uua-prod.workflow b/unified_user_actions/service/deploy/rekey-uua-prod.workflow new file mode 100644 index 000000000..b0a881e68 --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "rekey-uua-prod", + "config-files": [ + "rekey-uua.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:rekey-uua" + }, + { + "type": "packer", + "name": "rekey-uua", + "artifact": "./dist/rekey-uua.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "rekey-uua-prod-atla", + "key": "atla/discode/prod/rekey-uua" + }, + { + "name": "rekey-uua-prod-pdxa", + "key": "pdxa/discode/prod/rekey-uua" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/rekey-uua-staging.workflow b/unified_user_actions/service/deploy/rekey-uua-staging.workflow new file mode 100644 index 000000000..70fe64489 --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "rekey-uua-staging", + "config-files": [ + "rekey-uua.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:rekey-uua" + }, + { + "type": "packer", + "name": "rekey-uua-staging", + "artifact": "./dist/rekey-uua.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "rekey-uua-staging-pdxa", + "key": "pdxa/discode/staging/rekey-uua" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/rekey-uua.aurora b/unified_user_actions/service/deploy/rekey-uua.aurora new file mode 100644 index 000000000..bd64be350 --- /dev/null +++ b/unified_user_actions/service/deploy/rekey-uua.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'rekey-uua' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 100) + kafka_bootstrap_servers = Default(String, '/s/kafka/bluebird-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'unified_user_actions') + sink_topics = Default(String, 'uua_keyed') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=50.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 100, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/srv#/devel/local/kafka/ingestion-1:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'DEBUG', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-ads-callback-engagements-prod.workflow b/unified_user_actions/service/deploy/uua-ads-callback-engagements-prod.workflow new file mode 100644 index 000000000..857e42e6e --- /dev/null +++ b/unified_user_actions/service/deploy/uua-ads-callback-engagements-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-ads-callback-engagements-prod", + "config-files": [ + "uua-ads-callback-engagements.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-ads-callback-engagements" + }, + { + "type": "packer", + "name": "uua-ads-callback-engagements", + "artifact": "./dist/uua-ads-callback-engagements.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-ads-callback-engagements-prod-atla", + "key": "atla/discode/prod/uua-ads-callback-engagements" + }, + { + "name": "uua-ads-callback-engagements-prod-pdxa", + "key": "pdxa/discode/prod/uua-ads-callback-engagements" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-ads-callback-engagements-staging.workflow b/unified_user_actions/service/deploy/uua-ads-callback-engagements-staging.workflow new file mode 100644 index 000000000..3f7949bf5 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-ads-callback-engagements-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-ads-callback-engagements-staging", + "config-files": [ + "uua-ads-callback-engagements.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-ads-callback-engagements" + }, + { + "type": "packer", + "name": "uua-ads-callback-engagements-staging", + "artifact": "./dist/uua-ads-callback-engagements.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-ads-callback-engagements-staging-pdxa", + "key": "pdxa/discode/staging/uua-ads-callback-engagements" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-ads-callback-engagements.aurora b/unified_user_actions/service/deploy/uua-ads-callback-engagements.aurora new file mode 100644 index 000000000..5b11177c3 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-ads-callback-engagements.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-ads-callback-engagements' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 50) + kafka_bootstrap_servers = Default(String, '/s/kafka/ads-callback-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'ads_spend_prod') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=50.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'DEBUG', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-client-event-prod.workflow b/unified_user_actions/service/deploy/uua-client-event-prod.workflow new file mode 100644 index 000000000..33a7a3983 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-client-event-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-client-event-prod", + "config-files": [ + "uua-client-event.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-client-event" + }, + { + "type": "packer", + "name": "uua-client-event", + "artifact": "./dist/uua-client-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-client-event-prod-atla", + "key": "atla/discode/prod/uua-client-event" + }, + { + "name": "uua-client-event-prod-pdxa", + "key": "pdxa/discode/prod/uua-client-event" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-client-event-staging.workflow b/unified_user_actions/service/deploy/uua-client-event-staging.workflow new file mode 100644 index 000000000..375c10341 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-client-event-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-client-event-staging", + "config-files": [ + "uua-client-event.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-client-event" + }, + { + "type": "packer", + "name": "uua-client-event-staging", + "artifact": "./dist/uua-client-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-client-event-staging-pdxa", + "key": "pdxa/discode/staging/uua-client-event" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-client-event.aurora b/unified_user_actions/service/deploy/uua-client-event.aurora new file mode 100644 index 000000000..50bc06d48 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-client-event.aurora @@ -0,0 +1,174 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-client-event' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 1000) + kafka_bootstrap_servers = Default(String, '/s/kafka/client-events:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'client_event') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:MaxMetaspaceSize=536870912' + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + # CE events is about 0.4-0.6kb per message on the consumer side. A fetch size of 6~18 MB get us + # about 10k ~ 20k of messages per batch. This fits the size of our pending requests queue and + # within the limit of the max poll records. + ' -kafka.consumer.fetch.max=9.megabytes' + ' -kafka.consumer.fetch.min=3.megabytes' + ' -kafka.max.poll.records=40000' + ' -kafka.commit.interval=20.seconds' + ' -kafka.producer.batch.size=4.megabytes' + ' -kafka.producer.buffer.mem=64.megabytes' + ' -kafka.producer.linger=100.millisecond' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=4' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 1000, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-email-notification-event-prod.workflow b/unified_user_actions/service/deploy/uua-email-notification-event-prod.workflow new file mode 100644 index 000000000..c25d8b0df --- /dev/null +++ b/unified_user_actions/service/deploy/uua-email-notification-event-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-email-notification-event-prod", + "config-files": [ + "uua-email-notification-event.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-email-notification-event" + }, + { + "type": "packer", + "name": "uua-email-notification-event", + "artifact": "./dist/uua-email-notification-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-email-notification-event-prod-atla", + "key": "atla/discode/prod/uua-email-notification-event" + }, + { + "name": "uua-email-notification-event-prod-pdxa", + "key": "pdxa/discode/prod/uua-email-notification-event" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-email-notification-event-staging.workflow b/unified_user_actions/service/deploy/uua-email-notification-event-staging.workflow new file mode 100644 index 000000000..73e62dd3a --- /dev/null +++ b/unified_user_actions/service/deploy/uua-email-notification-event-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-email-notification-event-staging", + "config-files": [ + "uua-email-notification-event.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-email-notification-event" + }, + { + "type": "packer", + "name": "uua-email-notification-event-staging", + "artifact": "./dist/uua-email-notification-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-email-notification-event-staging-pdxa", + "key": "pdxa/discode/staging/uua-email-notification-event" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-email-notification-event.aurora b/unified_user_actions/service/deploy/uua-email-notification-event.aurora new file mode 100644 index 000000000..83dcced60 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-email-notification-event.aurora @@ -0,0 +1,169 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-email-notification-event' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 20) + kafka_bootstrap_servers = Default(String, '/s/kafka/main-2:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'notifications') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = RAM_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.max.poll.records=20000' + ' -kafka.commit.interval=10.seconds' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=64.megabytes' + ' -kafka.producer.linger=0.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-enricher-staging.workflow b/unified_user_actions/service/deploy/uua-enricher-staging.workflow new file mode 100644 index 000000000..814708b47 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-enricher-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-enricher-staging", + "config-files": [ + "uua-enricher.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-enricher" + }, + { + "type": "packer", + "name": "uua-enricher-staging", + "artifact": "./dist/uua-enricher.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-enricher-staging-pdxa", + "key": "pdxa/discode/staging/uua-enricher" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-enricher.aurora b/unified_user_actions/service/deploy/uua-enricher.aurora new file mode 100644 index 000000000..e962f6885 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-enricher.aurora @@ -0,0 +1,151 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-enricher' + +CPU_NUM = 3 +HEAP_SIZE = 6 * GB +RAM_SIZE = 8 * GB +DISK_SIZE = 3 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 10) + kafka_bootstrap_servers = Default(String, '/s/kafka/bluebird-1:kafka-tls') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.application.id={{name}}.{{environment}}' + ' -kafka.application.num.instances={{instances}}' # Used for static partitioning + ' -kafka.application.server={{mesos.instance}}.{{name}}.{{environment}}.{{role}}.service.{{cluster}}.twitter.com:80' + ' -com.twitter.finatra.kafkastreams.config.principal={{role}}' + ' -thrift.client.id={{name}}.{{environment}}' + ' -service.identifier="{{role}}:{{name}}:{{environment}}:{{cluster}}"' + ' -local.cache.ttl.seconds=86400' + ' -local.cache.max.size=400000000' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers = '/s/kafka/custdevel:kafka-tls' +) + +DEVEL = STAGING( + log_level = 'DEBUG', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-enrichment-planner-staging.workflow b/unified_user_actions/service/deploy/uua-enrichment-planner-staging.workflow new file mode 100644 index 000000000..c3ae6bcab --- /dev/null +++ b/unified_user_actions/service/deploy/uua-enrichment-planner-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-enrichment-planner-staging", + "config-files": [ + "uua-enrichment-planner.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-enrichment-planner" + }, + { + "type": "packer", + "name": "uua-enrichment-planner-staging", + "artifact": "./dist/uua-enrichment-planner.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-enricher-enrichment-planner-pdxa", + "key": "pdxa/discode/staging/uua-enrichment-planner" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-enrichment-planner.aurora b/unified_user_actions/service/deploy/uua-enrichment-planner.aurora new file mode 100644 index 000000000..c93d6f344 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-enrichment-planner.aurora @@ -0,0 +1,156 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-enrichment-planner' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 50) + kafka_bootstrap_servers = Default(String, '/s/kafka/bluebird-1:kafka-tls') + kafka_output_server = Default(String, '/s/kafka/bluebird-1:kafka-tls') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version(default_version='live') +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.output.server={{profile.kafka_output_server}}' + ' -kafka.application.id=uua-enrichment-planner' + ' -com.twitter.finatra.kafkastreams.config.principal={{role}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_output_server = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'DEBUG', + instances = 2, + kafka_output_server = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-favorite-archival-events-prod.workflow b/unified_user_actions/service/deploy/uua-favorite-archival-events-prod.workflow new file mode 100644 index 000000000..75484576d --- /dev/null +++ b/unified_user_actions/service/deploy/uua-favorite-archival-events-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-favorite-archival-events-prod", + "config-files": [ + "uua-favorite-archival-events.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-favorite-archival-events" + }, + { + "type": "packer", + "name": "uua-favorite-archival-events", + "artifact": "./dist/uua-favorite-archival-events.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-favorite-archival-events-prod-atla", + "key": "atla/discode/prod/uua-favorite-archival-events" + }, + { + "name": "uua-favorite-archival-events-prod-pdxa", + "key": "pdxa/discode/prod/uua-favorite-archival-events" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-favorite-archival-events-staging.workflow b/unified_user_actions/service/deploy/uua-favorite-archival-events-staging.workflow new file mode 100644 index 000000000..5954dd152 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-favorite-archival-events-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-favorite-archival-events-staging", + "config-files": [ + "uua-favorite-archival-events.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-favorite-archival-events" + }, + { + "type": "packer", + "name": "uua-favorite-archival-events-staging", + "artifact": "./dist/uua-favorite-archival-events.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-favorite-archival-events-staging-pdxa", + "key": "pdxa/discode/staging/uua-favorite-archival-events" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-favorite-archival-events.aurora b/unified_user_actions/service/deploy/uua-favorite-archival-events.aurora new file mode 100644 index 000000000..f37ad3d89 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-favorite-archival-events.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-favorite-archival-events' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 10) + kafka_bootstrap_servers = Default(String, '/s/kafka/main-2:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'favorite_archival_events') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = RAM_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=0.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-retweet-archival-events-prod.workflow b/unified_user_actions/service/deploy/uua-retweet-archival-events-prod.workflow new file mode 100644 index 000000000..519b8c958 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-retweet-archival-events-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-retweet-archival-events-prod", + "config-files": [ + "uua-retweet-archival-events.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-retweet-archival-events" + }, + { + "type": "packer", + "name": "uua-retweet-archival-events", + "artifact": "./dist/uua-retweet-archival-events.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-retweet-archival-events-prod-atla", + "key": "atla/discode/prod/uua-retweet-archival-events" + }, + { + "name": "uua-retweet-archival-events-prod-pdxa", + "key": "pdxa/discode/prod/uua-retweet-archival-events" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-retweet-archival-events-staging.workflow b/unified_user_actions/service/deploy/uua-retweet-archival-events-staging.workflow new file mode 100644 index 000000000..2cece5161 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-retweet-archival-events-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-retweet-archival-events-staging", + "config-files": [ + "uua-retweet-archival-events.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-retweet-archival-events" + }, + { + "type": "packer", + "name": "uua-retweet-archival-events-staging", + "artifact": "./dist/uua-retweet-archival-events.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-retweet-archival-events-staging-pdxa", + "key": "pdxa/discode/staging/uua-retweet-archival-events" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-retweet-archival-events.aurora b/unified_user_actions/service/deploy/uua-retweet-archival-events.aurora new file mode 100644 index 000000000..12c4dedae --- /dev/null +++ b/unified_user_actions/service/deploy/uua-retweet-archival-events.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-retweet-archival-events' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 10) + kafka_bootstrap_servers = Default(String, '/s/kafka/main-2:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'retweet_archival_events') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = RAM_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=0.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-social-graph-prod.workflow b/unified_user_actions/service/deploy/uua-social-graph-prod.workflow new file mode 100644 index 000000000..bc9debfc5 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-social-graph-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-social-graph-prod", + "config-files": [ + "uua-social-graph.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-social-graph" + }, + { + "type": "packer", + "name": "uua-social-graph", + "artifact": "./dist/uua-social-graph.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-social-graph-prod-atla", + "key": "atla/discode/prod/uua-social-graph" + }, + { + "name": "uua-social-graph-prod-pdxa", + "key": "pdxa/discode/prod/uua-social-graph" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-social-graph-staging.workflow b/unified_user_actions/service/deploy/uua-social-graph-staging.workflow new file mode 100644 index 000000000..9d022b4eb --- /dev/null +++ b/unified_user_actions/service/deploy/uua-social-graph-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-social-graph-staging", + "config-files": [ + "uua-social-graph.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-social-graph" + }, + { + "type": "packer", + "name": "uua-social-graph-staging", + "artifact": "./dist/uua-social-graph.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-social-graph-staging-pdxa", + "key": "pdxa/discode/staging/uua-social-graph" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-social-graph.aurora b/unified_user_actions/service/deploy/uua-social-graph.aurora new file mode 100644 index 000000000..79dbb4262 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-social-graph.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-social-graph' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 20) + kafka_bootstrap_servers = Default(String, '/s/kafka/bluebird-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'social_write_event') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = RAM_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=0.second' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-tls-favs-prod.workflow b/unified_user_actions/service/deploy/uua-tls-favs-prod.workflow new file mode 100644 index 000000000..1ca30b3dc --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tls-favs-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-tls-favs-prod", + "config-files": [ + "uua-tls-favs.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-tls-favs" + }, + { + "type": "packer", + "name": "uua-tls-favs", + "artifact": "./dist/uua-tls-favs.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-tls-favs-prod-atla", + "key": "atla/discode/prod/uua-tls-favs" + }, + { + "name": "uua-tls-favs-prod-pdxa", + "key": "pdxa/discode/prod/uua-tls-favs" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-tls-favs-staging.workflow b/unified_user_actions/service/deploy/uua-tls-favs-staging.workflow new file mode 100644 index 000000000..a2be55c29 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tls-favs-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-tls-favs-staging", + "config-files": [ + "uua-tls-favs.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-tls-favs" + }, + { + "type": "packer", + "name": "uua-tls-favs-staging", + "artifact": "./dist/uua-tls-favs.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-tls-favs-staging-pdxa", + "key": "pdxa/discode/staging/uua-tls-favs" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-tls-favs.aurora b/unified_user_actions/service/deploy/uua-tls-favs.aurora new file mode 100644 index 000000000..4f3c2a720 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tls-favs.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-tls-favs' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 20) + kafka_bootstrap_servers = Default(String, '/s/kafka/main-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'timeline_service_favorites') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = RAM_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=50.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-tweetypie-event-prod.workflow b/unified_user_actions/service/deploy/uua-tweetypie-event-prod.workflow new file mode 100644 index 000000000..ee1cfede2 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tweetypie-event-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-tweetypie-event-prod", + "config-files": [ + "uua-tweetypie-event.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-tweetypie-event" + }, + { + "type": "packer", + "name": "uua-tweetypie-event", + "artifact": "./dist/uua-tweetypie-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-tweetypie-event-prod-atla", + "key": "atla/discode/prod/uua-tweetypie-event" + }, + { + "name": "uua-tweetypie-event-prod-pdxa", + "key": "pdxa/discode/prod/uua-tweetypie-event" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-tweetypie-event-staging.workflow b/unified_user_actions/service/deploy/uua-tweetypie-event-staging.workflow new file mode 100644 index 000000000..be41907d6 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tweetypie-event-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-tweetypie-event-staging", + "config-files": [ + "uua-tweetypie-event.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-tweetypie-event" + }, + { + "type": "packer", + "name": "uua-tweetypie-event-staging", + "artifact": "./dist/uua-tweetypie-event.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-tweetypie-event-staging-pdxa", + "key": "pdxa/discode/staging/uua-tweetypie-event" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-tweetypie-event.aurora b/unified_user_actions/service/deploy/uua-tweetypie-event.aurora new file mode 100644 index 000000000..6adf59351 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-tweetypie-event.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-tweetypie-event' + +CPU_NUM = 2 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 20) + kafka_bootstrap_servers = Default(String, '/s/kafka/tweet-events:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'tweet_events') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=64.megabytes' + ' -kafka.producer.linger=0.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'INFO', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/deploy/uua-user-modification-prod.workflow b/unified_user_actions/service/deploy/uua-user-modification-prod.workflow new file mode 100644 index 000000000..abb6397de --- /dev/null +++ b/unified_user_actions/service/deploy/uua-user-modification-prod.workflow @@ -0,0 +1,66 @@ +{ + "role": "discode", + "name": "uua-user-modification-prod", + "config-files": [ + "uua-user-modification.aurora" + ], + "build": { + "play": true, + "trigger": { + "cron-schedule": "0 17 * * 2" + }, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-user-modification" + }, + { + "type": "packer", + "name": "uua-user-modification", + "artifact": "./dist/uua-user-modification.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "prod", + "targets": [ + { + "name": "uua-user-modification-prod-atla", + "key": "atla/discode/prod/uua-user-modification" + }, + { + "name": "uua-user-modification-prod-pdxa", + "key": "pdxa/discode/prod/uua-user-modification" + } + ] + } + ], + "subscriptions": [ + { + "type": "SLACK", + "recipients": [ + { + "to": "discode-oncall" + } + ], + "events": ["WORKFLOW_SUCCESS"] + }, + { + "type": "SLACK", + "recipients": [{ + "to": "discode-oncall" + }], + "events": ["*FAILED"] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-user-modification-staging.workflow b/unified_user_actions/service/deploy/uua-user-modification-staging.workflow new file mode 100644 index 000000000..55f8f4ef7 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-user-modification-staging.workflow @@ -0,0 +1,41 @@ +{ + "role": "discode", + "name": "uua-user-modification-staging", + "config-files": [ + "uua-user-modification.aurora" + ], + "build": { + "play": true, + "dependencies": [ + { + "role": "packer", + "name": "packer-client-no-pex", + "version": "latest" + } + ], + "steps": [ + { + "type": "bazel-bundle", + "name": "bundle", + "target": "unified_user_actions/service/src/main/scala:uua-user-modification" + }, + { + "type": "packer", + "name": "uua-user-modification-staging", + "artifact": "./dist/uua-user-modification.zip" + } + ] + }, + "targets": [ + { + "type": "group", + "name": "staging", + "targets": [ + { + "name": "uua-user-modification-staging-pdxa", + "key": "pdxa/discode/staging/uua-user-modification" + } + ] + } + ] +} diff --git a/unified_user_actions/service/deploy/uua-user-modification.aurora b/unified_user_actions/service/deploy/uua-user-modification.aurora new file mode 100644 index 000000000..82abd0483 --- /dev/null +++ b/unified_user_actions/service/deploy/uua-user-modification.aurora @@ -0,0 +1,167 @@ +import os +import itertools +import subprocess +import math + +SERVICE_NAME = 'uua-user-modification' + +CPU_NUM = 3 +HEAP_SIZE = 3 * GB +RAM_SIZE = HEAP_SIZE + 1 * GB +# We make disk size larger than HEAP so that if we ever need to do a heap dump, it will fit on disk. +DISK_SIZE = HEAP_SIZE + 2 * GB + +class Profile(Struct): + package = Default(String, SERVICE_NAME) + cmdline_flags = Default(String, '') + log_level = Default(String, 'INFO') + instances = Default(Integer, 10) + kafka_bootstrap_servers = Default(String, '/s/kafka/main-1:kafka-tls') + kafka_bootstrap_servers_remote_dest = Default(String, '/s/kafka/bluebird-1:kafka-tls') + source_topic = Default(String, 'user_modifications') + sink_topics = Default(String, 'unified_user_actions,unified_user_actions_engagements') + decider_overlay = Default(String, '') + +resources = Resources( + cpu = CPU_NUM, + ram = RAM_SIZE, + disk = DISK_SIZE +) + +install = Packer.install( + name = '{{profile.package}}', + version = Workflows.package_version() +) + +async_profiler_install = Packer.install( + name = 'async-profiler', + role = 'csl-perf', + version = 'latest' +) + +setup_jaas_config = Process( + name = 'setup_jaas_config', + cmdline = ''' + mkdir -p jaas_config + echo "KafkaClient { + com.sun.security.auth.module.Krb5LoginModule required + principal=\\"discode@TWITTER.BIZ\\" + useKeyTab=true + storeKey=true + keyTab=\\"/var/lib/tss/keys/fluffy/keytabs/client/discode.keytab\\" + doNotPrompt=true; + };" >> jaas_config/jaas.conf + ''' +) + +main = JVMProcess( + name = SERVICE_NAME, + jvm = Java11( + heap = HEAP_SIZE, + extra_jvm_flags = + '-Djava.net.preferIPv4Stack=true' + + ' -XX:+UseNUMA' + ' -XX:+AggressiveOpts' + ' -XX:+PerfDisableSharedMem' # http://www.evanjones.ca/jvm-mmap-pause.html + + ' -Dlog_level={{profile.log_level}}' + ' -Dlog.access.output=access.log' + ' -Dlog.service.output={{name}}.log' + ' -Djava.security.auth.login.config=jaas_config/jaas.conf' + ), + arguments = + '-jar {{name}}-bin.jar' + ' -admin.port=:{{thermos.ports[health]}}' + ' -kafka.bootstrap.servers={{profile.kafka_bootstrap_servers}}' + ' -kafka.bootstrap.servers.remote.dest={{profile.kafka_bootstrap_servers_remote_dest}}' + ' -kafka.group.id={{name}}-{{environment}}' + ' -kafka.producer.client.id={{name}}-{{environment}}' + ' -kafka.max.pending.requests=10000' + ' -kafka.consumer.fetch.max=1.megabytes' + ' -kafka.producer.batch.size=16.kilobytes' + ' -kafka.producer.buffer.mem=128.megabytes' + ' -kafka.producer.linger=50.milliseconds' + ' -kafka.producer.request.timeout=30.seconds' + ' -kafka.producer.compression.type=lz4' + ' -kafka.worker.threads=5' + ' -kafka.source.topic={{profile.source_topic}}' + ' -kafka.sink.topics={{profile.sink_topics}}' + ' -decider.base=decider.yml' + ' -decider.overlay={{profile.decider_overlay}}' + ' -cluster={{cluster}}' + ' {{profile.cmdline_flags}}', + resources = resources +) + +stats = Stats( + library = 'metrics', + port = 'admin' +) + +job_template = Service( + name = SERVICE_NAME, + role = 'discode', + instances = '{{profile.instances}}', + contact = 'disco-data-eng@twitter.com', + constraints = {'rack': 'limit:1', 'host': 'limit:1'}, + announce = Announcer( + primary_port = 'health', + portmap = {'aurora': 'health', 'admin': 'health'} + ), + task = Task( + resources = resources, + name = SERVICE_NAME, + processes = [async_profiler_install, install, setup_jaas_config, main, stats], + constraints = order(async_profiler_install, install, setup_jaas_config, main) + ), + health_check_config = HealthCheckConfig( + initial_interval_secs = 100, + interval_secs = 60, + timeout_secs = 60, + max_consecutive_failures = 4 + ), + update_config = UpdateConfig( + batch_size = 50, + watch_secs = 90, + max_per_shard_failures = 3, + max_total_failures = 0, + rollback_on_failure = False + ) +) + +PRODUCTION = Profile( + # go/uua-decider + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/prod/{{cluster}}/decider_overlay.yml' +) + +STAGING = Profile( + package = SERVICE_NAME+'-staging', + cmdline_flags = '', + kafka_bootstrap_servers_remote_dest = '/s/kafka/custdevel:kafka-tls', + decider_overlay = '/usr/local/config/overlays/discode-default/UnifiedUserActions/staging/{{cluster}}/decider_overlay.yml' # go/uua-decider +) + +DEVEL = STAGING( + log_level = 'DEBUG', +) + + +prod_job = job_template( + tier = 'preferred', + environment = 'prod', +).bind(profile = PRODUCTION) + +staging_job = job_template( + environment = 'staging' +).bind(profile = STAGING) + +devel_job = job_template( + environment = 'devel' +).bind(profile = DEVEL) + +jobs = [] +for cluster in ['atla', 'pdxa']: + jobs.append(prod_job(cluster = cluster)) + jobs.append(staging_job(cluster = cluster)) + jobs.append(devel_job(cluster = cluster)) diff --git a/unified_user_actions/service/src/main/resources/BUILD b/unified_user_actions/service/src/main/resources/BUILD new file mode 100644 index 000000000..90cacb56c --- /dev/null +++ b/unified_user_actions/service/src/main/resources/BUILD @@ -0,0 +1,13 @@ +resources( + sources = ["*.*"], + tags = ["bazel-compatible"], +) + +files( + name = "files", + sources = [ + "!BUILD", + "**/*", + ], + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/service/src/main/resources/decider.yml b/unified_user_actions/service/src/main/resources/decider.yml new file mode 100644 index 000000000..23aa40bc3 --- /dev/null +++ b/unified_user_actions/service/src/main/resources/decider.yml @@ -0,0 +1,324 @@ +# Naming convention: +# For publishing action types, use [Publish][ActionTypeInThrift]. Please see the Thrift definition at unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/action_info.thrift + +PublishServerTweetFav: + default_availability: 0 +PublishServerTweetUnfav: + default_availability: 0 +PublishServerTweetCreate: + default_availability: 0 +PublishServerTweetReply: + default_availability: 0 +PublishServerTweetQuote: + default_availability: 0 +PublishServerTweetRetweet: + default_availability: 0 +PublishServerTweetDelete: + default_availability: 0 +PublishServerTweetUnreply: + default_availability: 0 +PublishServerTweetUnquote: + default_availability: 0 +PublishServerTweetUnretweet: + default_availability: 0 +PublishServerTweetEdit: + default_availability: 0 +PublishServerTweetReport: + default_availability: 0 +PublishServerProfileFollow: + default_availability: 0 +PublishServerProfileUnfollow: + default_availability: 0 +PublishServerProfileBlock: + default_availability: 0 +PublishServerProfileUnblock: + default_availability: 0 +PublishServerProfileMute: + default_availability: 0 +PublishServerProfileUnmute: + default_availability: 0 +PublishServerProfileReport: + default_availability: 0 +PublishClientTweetFav: + default_availability: 0 +PublishClientTweetUnfav: + default_availability: 0 +PublishClientTweetLingerImpression: + default_availability: 0 +PublishClientTweetRenderImpression: + default_availability: 0 +PublishClientTweetReply: + default_availability: 0 +PublishClientTweetQuote: + default_availability: 0 +PublishClientTweetRetweet: + default_availability: 0 +PublishClientTweetClickReply: + default_availability: 0 +PublishClientTweetClickQuote: + default_availability: 0 +PublishClientTweetVideoPlayback25: + default_availability: 0 +PublishClientTweetVideoPlayback50: + default_availability: 0 +PublishClientTweetVideoPlayback75: + default_availability: 0 +PublishClientTweetVideoPlayback95: + default_availability: 0 +PublishClientTweetVideoPlayFromTap: + default_availability: 0 +PublishClientTweetVideoQualityView: + default_availability: 0 +PublishClientTweetVideoView: + default_availability: 0 +PublishClientTweetVideoMrcView: + default_availability: 0 +PublishClientTweetVideoViewThreshold: + default_availability: 0 +PublishClientTweetVideoCtaUrlClick: + default_availability: 0 +PublishClientTweetVideoCtaWatchClick: + default_availability: 0 +PublishClientTweetUnretweet: + default_availability: 0 +PublishClientTweetClickCaret: + default_availability: 0 +PublishClientTweetPhotoExpand: + default_availability: 0 +PublishClientTweetClickMentionScreenName: + default_availability: 0 +PublishClientCardClick: + default_availability: 0 +PublishClientCardOpenApp: + default_availability: 0 +PublishClientCardAppInstallAttempt: + default_availability: 0 +PublishClientPollCardVote: + default_availability: 0 +PublishClientTweetProfileMentionClick: + default_availability: 0 +PublishClientTweetClick: + default_availability: 0 +PublishClientTopicFollow: + default_availability: 0 +PublishClientTopicUnfollow: + default_availability: 0 +PublishClientTopicNotInterestedIn: + default_availability: 0 +PublishClientTopicUndoNotInterestedIn: + default_availability: 0 +PublishClientTweetNotHelpful: + default_availability: 0 +PublishClientTweetUndoNotHelpful: + default_availability: 0 +PublishClientTweetReport: + default_availability: 0 +PublishClientTweetNotInterestedIn: + default_availability: 0 +PublishClientTweetUndoNotInterestedIn: + default_availability: 0 +PublishClientTweetNotAboutTopic: + default_availability: 0 +PublishClientTweetUndoNotAboutTopic: + default_availability: 0 +PublishClientTweetNotRecent: + default_availability: 0 +PublishClientTweetUndoNotRecent: + default_availability: 0 +PublishClientTweetSeeFewer: + default_availability: 0 +PublishClientTweetUndoSeeFewer: + default_availability: 0 +PublishClientTweetNotRelevant: + default_availability: 0 +PublishClientTweetUndoNotRelevant: + default_availability: 0 +PublishClientProfileFollowAttempt: + default_availability: 0 +PublishClientTweetFavoriteAttempt: + default_availability: 0 +PublishClientTweetRetweetAttempt: + default_availability: 0 +PublishClientTweetReplyAttempt: + default_availability: 0 +PublishClientCTALoginClick: + default_availability: 0 +PublishClientCTALoginStart: + default_availability: 0 +PublishClientCTALoginSuccess: + default_availability: 0 +PublishClientCTASignupClick: + default_availability: 0 +PublishClientCTASignupSuccess: + default_availability: 0 +PublishClientProfileBlock: + default_availability: 0 +PublishClientProfileUnblock: + default_availability: 0 +PublishClientProfileMute: + default_availability: 0 +PublishClientProfileReport: + default_availability: 0 +PublishClientProfileFollow: + default_availability: 0 +PublishClientProfileClick: + default_availability: 0 +PublishClientTweetFollowAuthor: + default_availability: 0 +PublishClientTweetUnfollowAuthor: + default_availability: 0 +PublishClientTweetBlockAuthor: + default_availability: 0 +PublishClientTweetUnblockAuthor: + default_availability: 0 +PublishClientTweetMuteAuthor: + default_availability: 0 +PublishClientNotificationOpen: + default_availability: 0 +PublishClientNotificationClick: + default_availability: 0 +PublishClientNotificationSeeLessOften: + default_availability: 0 +PublishClientNotificationDismiss: + default_availability: 0 +PublishClientTypeaheadClick: + default_availability: 0 +PublishClientFeedbackPromptSubmit: + default_availability: 0 +PublishClientProfileShow: + default_availability: 0 +PublishClientTweetV2Impression: + default_availability: 0 +PublishClientTweetVideoFullscreenV2Impression: + default_availability: 0 +PublishClientTweetImageFullscreenV2Impression: + default_availability: 0 +PublishClientProfileV2Impression: + default_availability: 0 +PublishClientTweetClickProfile: + default_availability: 0 +PublishClientTweetClickShare: + default_availability: 0 +PublishClientTweetShareViaCopyLink: + default_availability: 0 +PublishClientTweetClickSendViaDirectMessage: + default_availability: 0 +PublishClientTweetShareViaBookmark: + default_availability: 0 +PublishClientTweetUnbookmark: + default_availability: 0 +PublishClientTweetClickHashtag: + default_availability: 0 +PublishClientTweetBookmark: + default_availability: 0 +PublishClientTweetOpenLink: + default_availability: 0 +PublishClientTweetTakeScreenshot: + default_availability: 0 +PublishClientTweetVideoPlaybackStart: + default_availability: 0 +PublishClientTweetVideoPlaybackComplete: + default_availability: 0 +PublishClientTweetEmailClick: + default_availability: 0 +PublishClientAppExit: + default_availability: 0 +PublishClientTweetGalleryImpression: + default_availability: 0 +PublishClientTweetDetailsImpression: + default_availability: 0 +PublishClientTweetMomentImpression: + default_availability: 0 +PublishServerUserCreate: + default_availability: 0 +PublishServerUserUpdate: + default_availability: 0 +PublishServerPromotedTweetFav: + default_availability: 0 +PublishServerPromotedTweetUnfav: + default_availability: 0 +PublishServerPromotedTweetReply: + default_availability: 0 +PublishServerPromotedTweetRetweet: + default_availability: 0 +PublishServerPromotedTweetComposeTweet: + default_availability: 0 +PublishServerPromotedTweetBlockAuthor: + default_availability: 0 +PublishServerPromotedTweetUnblockAuthor: + default_availability: 0 +PublishServerPromotedTweetClick: + default_availability: 0 +PublishServerPromotedTweetReport: + default_availability: 0 +PublishServerPromotedProfileFollow: + default_availability: 0 +PublishServerPromotedProfileUnfollow: + default_availability: 0 +PublishServerPromotedTweetMuteAuthor: + default_availability: 0 +PublishServerPromotedTweetClickProfile: + default_availability: 0 +PublishServerPromotedTweetClickHashtag: + default_availability: 0 +PublishServerPromotedTweetOpenLink: + default_availability: 0 +PublishServerPromotedTweetCarouselSwipeNext: + default_availability: 0 +PublishServerPromotedTweetCarouselSwipePrevious: + default_availability: 0 +PublishServerPromotedTweetLingerImpressionShort: + default_availability: 0 +PublishServerPromotedTweetLingerImpressionMedium: + default_availability: 0 +PublishServerPromotedTweetLingerImpressionLong: + default_availability: 0 +PublishServerPromotedTweetClickSpotlight: + default_availability: 0 +PublishServerPromotedTweetViewSpotlight: + default_availability: 0 +PublishServerPromotedTrendView: + default_availability: 0 +PublishServerPromotedTrendClick: + default_availability: 0 +PublishServerPromotedTweetVideoPlayback25: + default_availability: 0 +PublishServerPromotedTweetVideoPlayback50: + default_availability: 0 +PublishServerPromotedTweetVideoPlayback75: + default_availability: 0 +PublishServerPromotedTweetVideoAdPlayback25: + default_availability: 0 +PublishServerPromotedTweetVideoAdPlayback50: + default_availability: 0 +PublishServerPromotedTweetVideoAdPlayback75: + default_availability: 0 +PublishServerTweetVideoAdPlayback25: + default_availability: 0 +PublishServerTweetVideoAdPlayback50: + default_availability: 0 +PublishServerTweetVideoAdPlayback75: + default_availability: 0 +PublishServerPromotedTweetDismissWithoutReason: + default_availability: 0 +PublishServerPromotedTweetDismissUninteresting: + default_availability: 0 +PublishServerPromotedTweetDismissRepetitive: + default_availability: 0 +PublishServerPromotedTweetDismissSpam: + default_availability: 0 +PublishServerTweetArchiveFavorite: + default_availability: 0 +PublishServerTweetUnarchiveFavorite: + default_availability: 0 +PublishServerTweetArchiveRetweet: + default_availability: 0 +PublishServerTweetUnarchiveRetweet: + default_availability: 0 +RekeyUUAClientTweetRenderImpression: + default_availability: 0 +RekeyUUAIesourceClientTweetRenderImpression: + default_availability: 0 +EnrichmentPlannerSampling: + default_availability: 0 + diff --git a/unified_user_actions/service/src/main/resources/logback.xml b/unified_user_actions/service/src/main/resources/logback.xml new file mode 100644 index 000000000..c23b0d6b6 --- /dev/null +++ b/unified_user_actions/service/src/main/resources/logback.xml @@ -0,0 +1,85 @@ + + + + + + + + + true + + + + + + + + + + + + + + ${log.service.output} + + ${log.service.output}.%i + 1 + 10 + + + 50MB + + + %date %.-3level %logger ${DEFAULT_SERVICE_PATTERN}%n + + + + + + false + ${log.lens.index} + ${log.lens.tag}/service + + ${DEFAULT_SERVICE_PATTERN} + + + + + + + + + + + + + + WARN + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/unified_user_actions/service/src/main/scala/BUILD b/unified_user_actions/service/src/main/scala/BUILD new file mode 100644 index 000000000..fe9ce7063 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/BUILD @@ -0,0 +1,390 @@ +jvm_binary( + name = "uua-tls-favs-bin", + basename = "uua-tls-favs-bin", + main = "com.twitter.unified_user_actions.service.TlsFavsServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:tls-favs", + ], +) + +jvm_app( + name = "uua-tls-favs", + archive = "zip", + binary = ":uua-tls-favs-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-client-event-bin", + basename = "uua-client-event-bin", + main = "com.twitter.unified_user_actions.service.ClientEventServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:client-event", + ], +) + +jvm_app( + name = "uua-client-event", + archive = "zip", + binary = ":uua-client-event-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + + +jvm_binary( + name = "uua-tweetypie-event-bin", + basename = "uua-tweetypie-event-bin", + main = "com.twitter.unified_user_actions.service.TweetypieEventServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:tweetypie-event", + ], +) + +jvm_app( + name = "uua-tweetypie-event", + archive = "zip", + binary = ":uua-tweetypie-event-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-social-graph-bin", + basename = "uua-social-graph-bin", + main = "com.twitter.unified_user_actions.service.SocialGraphServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:social-graph-event", + ], +) + +jvm_app( + name = "uua-social-graph", + archive = "zip", + binary = ":uua-social-graph-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-email-notification-event-bin", + basename = "uua-email-notification-event-bin", + main = "com.twitter.unified_user_actions.service.EmailNotificationEventServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:email-notification-event", + ], +) + +jvm_app( + name = "uua-email-notification-event", + archive = "zip", + binary = ":uua-email-notification-event-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-user-modification-bin", + basename = "uua-user-modification-bin", + main = "com.twitter.unified_user_actions.service.UserModificationServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:user-modification-event", + ], +) + +jvm_app( + name = "uua-user-modification", + archive = "zip", + binary = ":uua-user-modification-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-ads-callback-engagements-bin", + basename = "uua-ads-callback-engagements-bin", + main = "com.twitter.unified_user_actions.service.AdsCallbackEngagementsServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:ads-callback-engagements", + ], +) + +jvm_app( + name = "uua-ads-callback-engagements", + archive = "zip", + binary = ":uua-ads-callback-engagements-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-favorite-archival-events-bin", + basename = "uua-favorite-archival-events-bin", + main = "com.twitter.unified_user_actions.service.FavoriteArchivalEventsServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:favorite-archival-events", + ], +) + +jvm_app( + name = "uua-favorite-archival-events", + archive = "zip", + binary = ":uua-favorite-archival-events-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-retweet-archival-events-bin", + basename = "uua-retweet-archival-events-bin", + main = "com.twitter.unified_user_actions.service.RetweetArchivalEventsServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:retweet-archival-events", + ], +) + +jvm_app( + name = "uua-retweet-archival-events", + archive = "zip", + binary = ":uua-retweet-archival-events-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "rekey-uua-bin", + basename = "rekey-uua-bin", + main = "com.twitter.unified_user_actions.service.RekeyUuaServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:rekey-uua", + ], +) + +jvm_app( + name = "rekey-uua", + archive = "zip", + binary = ":rekey-uua-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "rekey-uua-iesource-bin", + basename = "rekey-uua-iesource-bin", + main = "com.twitter.unified_user_actions.service.RekeyUuaIesourceServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:rekey-uua-iesource", + ], +) + +jvm_app( + name = "rekey-uua-iesource", + archive = "zip", + binary = ":rekey-uua-iesource-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-enrichment-planner-bin", + basename = "uua-enrichment-planner-bin", + main = "com.twitter.unified_user_actions.service.EnrichmentPlannerServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:enrichment-planner", + ], +) + +jvm_app( + name = "uua-enrichment-planner", + archive = "zip", + binary = ":uua-enrichment-planner-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) + +jvm_binary( + name = "uua-enricher-bin", + basename = "uua-enricher-bin", + main = "com.twitter.unified_user_actions.service.EnricherServiceMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "twitter-server-internal/src/main/scala", + "twitter-server/logback-classic/src/main/scala", + "unified_user_actions/service/src/main/resources", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:enricher", + ], +) + +jvm_app( + name = "uua-enricher", + archive = "zip", + binary = ":uua-enricher-bin", + bundles = [ + bundle( + fileset = ["**/*"], + owning_target = "unified_user_actions/service/src/main/resources:files", + rel_path = "unified_user_actions/service/src/main/resources", + ), + ], + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/AdsCallbackEngagementsService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/AdsCallbackEngagementsService.scala new file mode 100644 index 000000000..9e0a23aac --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/AdsCallbackEngagementsService.scala @@ -0,0 +1,25 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorAdsCallbackEngagementsModule + +object AdsCallbackEngagementsServiceMain extends AdsCallbackEngagementsService + +class AdsCallbackEngagementsService extends TwitterServer { + override val modules = Seq( + KafkaProcessorAdsCallbackEngagementsModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, SpendServerEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/BUILD b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/BUILD new file mode 100644 index 000000000..2936e039d --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/BUILD @@ -0,0 +1,270 @@ +scala_library( + name = "tls-favs", + sources = ["TlsFavsService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "src/thrift/com/twitter/timelineservice/server/internal:thrift-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:tls-favs", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "client-event", + sources = ["ClientEventService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twadoop_config/configuration/log_categories/group/scribelib:client_event-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:client-event", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "tweetypie-event", + sources = ["TweetypieEventService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twadoop_config/configuration/log_categories/group/scribelib:client_event-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:tweetypie-event", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "social-graph-event", + sources = ["SocialGraphService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:social-graph-event", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "email-notification-event", + sources = ["EmailNotificationEventService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:email-notification-event", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "user-modification-event", + sources = ["UserModificationService.scala"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:user-modification-event", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "ads-callback-engagements", + sources = ["AdsCallbackEngagementsService.scala"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:ads-callback-engagements", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "favorite-archival-events", + sources = ["FavoriteArchivalEventsService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:favorite-archival-events", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "retweet-archival-events", + sources = ["RetweetArchivalEventsService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:retweet-archival-events", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "rekey-uua", + sources = ["RekeyUuaService.scala"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:rekey-uua", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "rekey-uua-iesource", + sources = ["RekeyUuaIesourceService.scala"], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:rekey-uua-iesource", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "enrichment-planner", + sources = ["EnrichmentPlannerService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "decider/src/main/scala", + "finatra-internal/decider/src/main/scala", + "finatra-internal/kafka-streams/kafka-streams/src/main/scala", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/producers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra-internal/mtls/src/main/scala", + "kafka/finagle-kafka/finatra-kafka-streams/kafka-streams-static-partitioning/src/main/scala", + "kafka/finagle-kafka/finatra-kafka-streams/kafka-streams/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:noop", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner:default", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) + +scala_library( + name = "enricher", + sources = ["EnricherService.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "finatra-internal/kafka-streams/kafka-streams/src/main/scala", + "finatra-internal/mtls/src/main/scala", + "finatra/inject/inject-server/src/main/scala/com/twitter/inject/server", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "kafka/finagle-kafka/finatra-kafka-streams/kafka-streams-static-partitioning/src/main/scala", + "kafka/finagle-kafka/finatra-kafka-streams/kafka-streams/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/driver", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/graphql", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hydrator:default", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/partitioner:default", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:cache", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module:graphql-client", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-app/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/ClientEventService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/ClientEventService.scala new file mode 100644 index 000000000..17584a2dc --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/ClientEventService.scala @@ -0,0 +1,23 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorClientEventModule + +object ClientEventServiceMain extends ClientEventService + +class ClientEventService extends TwitterServer { + + override val modules = Seq(KafkaProcessorClientEventModule, DeciderModule) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, LogEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EmailNotificationEventService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EmailNotificationEventService.scala new file mode 100644 index 000000000..d5f2b6d9a --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EmailNotificationEventService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.ibis.thriftscala.NotificationScribe +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorEmailNotificationEventModule + +object EmailNotificationEventServiceMain extends EmailNotificationEventService + +class EmailNotificationEventService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorEmailNotificationEventModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, NotificationScribe]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnricherService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnricherService.scala new file mode 100644 index 000000000..9459871ed --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnricherService.scala @@ -0,0 +1,105 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.conversions.DurationOps._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.dynmap.DynMap +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.config.KafkaStreamsConfig +import com.twitter.finatra.kafkastreams.config.SecureKafkaStreamsConfig +import com.twitter.finatra.kafkastreams.partitioning.StaticPartitioning +import com.twitter.finatra.mtls.modules.ServiceIdentifierModule +import com.twitter.finatra.kafkastreams.dsl.FinatraDslFlatMapAsync +import com.twitter.graphql.thriftscala.GraphqlExecutionService +import com.twitter.logging.Logging +import com.twitter.unified_user_actions.enricher.driver.EnrichmentDriver +import com.twitter.unified_user_actions.enricher.hcache.LocalCache +import com.twitter.unified_user_actions.enricher.hydrator.DefaultHydrator +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.partitioner.DefaultPartitioner +import com.twitter.unified_user_actions.service.module.CacheModule +import com.twitter.unified_user_actions.service.module.ClientIdModule +import com.twitter.unified_user_actions.service.module.GraphqlClientProviderModule +import com.twitter.util.Future +import org.apache.kafka.common.record.CompressionType +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.processor.RecordContext +import org.apache.kafka.streams.processor.TopicNameExtractor +import org.apache.kafka.streams.scala.kstream.Consumed +import org.apache.kafka.streams.scala.kstream.Produced +import com.twitter.unified_user_actions.enricher.driver.EnrichmentPlanUtils._ + +object EnricherServiceMain extends EnricherService + +class EnricherService + extends FinatraDslFlatMapAsync + with StaticPartitioning + with SecureKafkaStreamsConfig + with Logging { + val InputTopic = "unified_user_actions_keyed_dev" + val OutputTopic = "unified_user_actions_enriched" + + override val modules = Seq( + CacheModule, + ClientIdModule, + GraphqlClientProviderModule, + ServiceIdentifierModule + ) + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + val graphqlClient = injector.instance[GraphqlExecutionService.FinagledClient] + val localCache = injector.instance[LocalCache[EnrichmentKey, DynMap]] + val statsReceiver = injector.instance[StatsReceiver] + val driver = new EnrichmentDriver( + finalOutputTopic = Some(OutputTopic), + partitionedTopic = InputTopic, + hydrator = new DefaultHydrator( + cache = localCache, + graphqlClient = graphqlClient, + scopedStatsReceiver = statsReceiver.scope("DefaultHydrator")), + partitioner = new DefaultPartitioner + ) + + val kstream = builder.asScala + .stream(InputTopic)( + Consumed.`with`(ScalaSerdes.Thrift[EnrichmentKey], ScalaSerdes.Thrift[EnrichmentEnvelop])) + .flatMapAsync[EnrichmentKey, EnrichmentEnvelop]( + commitInterval = 5.seconds, + numWorkers = 10000 + ) { (enrichmentKey: EnrichmentKey, enrichmentEnvelop: EnrichmentEnvelop) => + driver + .execute(Some(enrichmentKey), Future.value(enrichmentEnvelop)) + .map(tuple => tuple._1.map(key => (key, tuple._2)).seq) + } + + val topicExtractor: TopicNameExtractor[EnrichmentKey, EnrichmentEnvelop] = + (_: EnrichmentKey, envelop: EnrichmentEnvelop, _: RecordContext) => + envelop.plan.getLastCompletedStage.outputTopic.getOrElse( + throw new IllegalStateException("Missing output topic in the last completed stage")) + + kstream.to(topicExtractor)( + Produced.`with`(ScalaSerdes.Thrift[EnrichmentKey], ScalaSerdes.Thrift[EnrichmentEnvelop])) + } + + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = + super + .streamsProperties(config) + .consumer.groupId(KafkaGroupId(applicationId())) + .consumer.clientId(s"${applicationId()}-consumer") + .consumer.requestTimeout(30.seconds) + .consumer.sessionTimeout(30.seconds) + .consumer.fetchMin(1.megabyte) + .consumer.fetchMax(5.megabytes) + .consumer.receiveBuffer(32.megabytes) + .consumer.maxPollInterval(1.minute) + .consumer.maxPollRecords(50000) + .producer.clientId(s"${applicationId()}-producer") + .producer.batchSize(16.kilobytes) + .producer.bufferMemorySize(256.megabyte) + .producer.requestTimeout(30.seconds) + .producer.compressionType(CompressionType.LZ4) + .producer.ackMode(AckMode.ALL) +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerService.scala new file mode 100644 index 000000000..fc8e8dbef --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerService.scala @@ -0,0 +1,187 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.app.Flag +import com.twitter.conversions.DurationOps._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.decider.Decider +import com.twitter.decider.SimpleRecipient +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.domain.KafkaTopic +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerConfig +import com.twitter.finatra.kafka.producers.KafkaProducerConfig +import com.twitter.finatra.kafka.producers.TwitterKafkaProducerConfig +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.finatra.kafkastreams.config.KafkaStreamsConfig +import com.twitter.finatra.kafkastreams.config.SecureKafkaStreamsConfig +import com.twitter.finatra.kafkastreams.dsl.FinatraDslToCluster +import com.twitter.inject.TwitterModule +import com.twitter.unified_user_actions.enricher.driver.EnrichmentDriver +import com.twitter.unified_user_actions.enricher.hydrator.NoopHydrator +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.NotificationTweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentInstruction.TweetEnrichment +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentPlan +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStage +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageStatus +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentStageType +import com.twitter.unified_user_actions.enricher.partitioner.DefaultPartitioner +import com.twitter.unified_user_actions.enricher.partitioner.DefaultPartitioner.NullKey +import com.twitter.unified_user_actions.thriftscala.Item +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Await +import com.twitter.util.Future +import org.apache.kafka.common.record.CompressionType +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.scala.kstream.Consumed +import org.apache.kafka.streams.scala.kstream.KStream +import org.apache.kafka.streams.scala.kstream.Produced +object EnrichmentPlannerServiceMain extends EnrichmentPlannerService { + val ApplicationId = "uua-enrichment-planner" + val InputTopic = "unified_user_actions" + val OutputPartitionedTopic = "unified_user_actions_keyed_dev" + val SamplingDecider = "EnrichmentPlannerSampling" +} + +/** + * This service is the first step (planner) of the UUA Enrichment process. + * It does the following: + * 1. Read Prod UUA topic unified_user_actions from the Prod cluster and write to (see below) either Prod cluster (prod) or Dev cluster (dev/staging) + * 2. For the write, it optionally randomly downsample the events when publishing, controlled by a Decider + * 3. The output's key would be the first step of the repartitioning, most likely the EnrichmentKey of the Tweet type. + */ +class EnrichmentPlannerService extends FinatraDslToCluster with SecureKafkaStreamsConfig { + import EnrichmentPlannerServiceMain._ + + val kafkaOutputCluster: Flag[String] = flag( + name = "kafka.output.server", + default = "", + help = + """The output Kafka cluster. + |This is needed since we read from a cluster and potentially output to a different cluster. + |""".stripMargin + ) + + val kafkaOutputEnableTls: Flag[Boolean] = flag( + name = "kafka.output.enable.tls", + default = true, + help = "" + ) + + override val modules: Seq[TwitterModule] = Seq( + DeciderModule + ) + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + val decider = injector.instance[Decider] + val driver = new EnrichmentDriver( + finalOutputTopic = NoopHydrator.OutputTopic, + partitionedTopic = OutputPartitionedTopic, + hydrator = new NoopHydrator, + partitioner = new DefaultPartitioner) + + val builderWithoutOutput = builder.asScala + .stream(InputTopic)(Consumed.`with`(UnKeyedSerde, ScalaSerdes.Thrift[UnifiedUserAction])) + // this maps and filters out the nil envelop before further processing + .flatMapValues { uua => + (uua.item match { + case Item.TweetInfo(_) => + Some(EnrichmentEnvelop( + envelopId = uua.hashCode.toLong, + uua = uua, + plan = EnrichmentPlan(Seq( + EnrichmentStage( + status = EnrichmentStageStatus.Initialized, + stageType = EnrichmentStageType.Repartition, + instructions = Seq(TweetEnrichment) + ), + EnrichmentStage( + status = EnrichmentStageStatus.Initialized, + stageType = EnrichmentStageType.Hydration, + instructions = Seq(TweetEnrichment) + ), + )) + )) + case Item.NotificationInfo(_) => + Some(EnrichmentEnvelop( + envelopId = uua.hashCode.toLong, + uua = uua, + plan = EnrichmentPlan(Seq( + EnrichmentStage( + status = EnrichmentStageStatus.Initialized, + stageType = EnrichmentStageType.Repartition, + instructions = Seq(NotificationTweetEnrichment) + ), + EnrichmentStage( + status = EnrichmentStageStatus.Initialized, + stageType = EnrichmentStageType.Hydration, + instructions = Seq(NotificationTweetEnrichment) + ), + )) + )) + case _ => None + }).seq + } + // execute our driver logics + .flatMap((_: UnKeyed, envelop: EnrichmentEnvelop) => { + // flatMap and Await.result is used here because our driver interface allows for + // both synchronous (repartition logic) and async operations (hydration logic), but in here + // we purely just need to repartition synchronously, and thus the flatMap + Await.result + // is used to simplify and make testing much easier. + val (keyOpt, value) = Await.result(driver.execute(NullKey, Future.value(envelop))) + keyOpt.map(key => (key, value)).seq + }) + // then finally we sample based on the output keys + .filter((key, _) => + decider.isAvailable(feature = SamplingDecider, Some(SimpleRecipient(key.id)))) + + configureOutput(builderWithoutOutput) + } + + private def configureOutput(kstream: KStream[EnrichmentKey, EnrichmentEnvelop]): Unit = { + if (kafkaOutputCluster().nonEmpty && kafkaOutputCluster() != bootstrapServer()) { + kstream.toCluster( + cluster = kafkaOutputCluster(), + topic = KafkaTopic(OutputPartitionedTopic), + clientId = s"$ApplicationId-output-producer", + kafkaProducerConfig = + if (kafkaOutputEnableTls()) + FinagleKafkaProducerConfig[EnrichmentKey, EnrichmentEnvelop](kafkaProducerConfig = + KafkaProducerConfig(TwitterKafkaProducerConfig().requestTimeout(1.minute).configMap)) + else + FinagleKafkaProducerConfig[EnrichmentKey, EnrichmentEnvelop]( + kafkaProducerConfig = KafkaProducerConfig() + .requestTimeout(1.minute)), + statsReceiver = statsReceiver, + commitInterval = 15.seconds + )(Produced.`with`(ScalaSerdes.Thrift[EnrichmentKey], ScalaSerdes.Thrift[EnrichmentEnvelop])) + } else { + kstream.to(OutputPartitionedTopic)( + Produced.`with`(ScalaSerdes.Thrift[EnrichmentKey], ScalaSerdes.Thrift[EnrichmentEnvelop])) + } + } + + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .consumer.groupId(KafkaGroupId(ApplicationId)) + .consumer.clientId(s"$ApplicationId-consumer") + .consumer.requestTimeout(30.seconds) + .consumer.sessionTimeout(30.seconds) + .consumer.fetchMin(1.megabyte) + .consumer.fetchMax(5.megabyte) + .consumer.receiveBuffer(32.megabytes) + .consumer.maxPollInterval(1.minute) + .consumer.maxPollRecords(50000) + .producer.clientId(s"$ApplicationId-producer") + .producer.batchSize(16.kilobytes) + .producer.bufferMemorySize(256.megabyte) + .producer.requestTimeout(30.seconds) + .producer.compressionType(CompressionType.LZ4) + .producer.ackMode(AckMode.ALL) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/FavoriteArchivalEventsService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/FavoriteArchivalEventsService.scala new file mode 100644 index 000000000..b4014a13e --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/FavoriteArchivalEventsService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.timelineservice.fanout.thriftscala.FavoriteArchivalEvent +import com.twitter.unified_user_actions.service.module.KafkaProcessorFavoriteArchivalEventsModule + +object FavoriteArchivalEventsServiceMain extends FavoriteArchivalEventsService + +class FavoriteArchivalEventsService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorFavoriteArchivalEventsModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, FavoriteArchivalEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceService.scala new file mode 100644 index 000000000..f0db8032b --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.iesource.thriftscala.InteractionEvent +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorRekeyUuaIesourceModule + +object RekeyUuaIesourceServiceMain extends RekeyUuaIesourceService + +class RekeyUuaIesourceService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorRekeyUuaIesourceModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, InteractionEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaService.scala new file mode 100644 index 000000000..6928df498 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RekeyUuaService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorRekeyUuaModule +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +object RekeyUuaServiceMain extends RekeyUuaService + +class RekeyUuaService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorRekeyUuaModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, UnifiedUserAction]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RetweetArchivalEventsService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RetweetArchivalEventsService.scala new file mode 100644 index 000000000..dcbbc8bd6 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/RetweetArchivalEventsService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.tweetypie.thriftscala.RetweetArchivalEvent +import com.twitter.unified_user_actions.service.module.KafkaProcessorRetweetArchivalEventsModule + +object RetweetArchivalEventsServiceMain extends RetweetArchivalEventsService + +class RetweetArchivalEventsService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorRetweetArchivalEventsModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, RetweetArchivalEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/SocialGraphService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/SocialGraphService.scala new file mode 100644 index 000000000..89917d1ec --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/SocialGraphService.scala @@ -0,0 +1,25 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.unified_user_actions.service.module.KafkaProcessorSocialGraphModule + +object SocialGraphServiceMain extends SocialGraphService + +class SocialGraphService extends TwitterServer { + override val modules = Seq( + KafkaProcessorSocialGraphModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, WriteEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TlsFavsService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TlsFavsService.scala new file mode 100644 index 000000000..a96891c46 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TlsFavsService.scala @@ -0,0 +1,26 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.timelineservice.thriftscala.ContextualizedFavoriteEvent +import com.twitter.unified_user_actions.service.module.KafkaProcessorTlsFavsModule + +object TlsFavsServiceMain extends TlsFavsService + +class TlsFavsService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorTlsFavsModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, ContextualizedFavoriteEvent]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TweetypieEventService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TweetypieEventService.scala new file mode 100644 index 000000000..c8516492d --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/TweetypieEventService.scala @@ -0,0 +1,27 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.tweetypie.thriftscala.TweetEvent +import com.twitter.unified_user_actions.service.module.KafkaProcessorTweetypieEventModule + +object TweetypieEventServiceMain extends TweetypieEventService + +class TweetypieEventService extends TwitterServer { + + override val modules = Seq( + KafkaProcessorTweetypieEventModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, TweetEvent]] + closeOnExit(processor) + processor.start() + } + +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/UserModificationService.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/UserModificationService.scala new file mode 100644 index 000000000..ff16d6334 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/UserModificationService.scala @@ -0,0 +1,25 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.decider.modules.DeciderModule +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.gizmoduck.thriftscala.UserModification +import com.twitter.inject.server.TwitterServer +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.service.module.KafkaProcessorUserModificationModule + +object UserModificationServiceMain extends UserModificationService + +class UserModificationService extends TwitterServer { + override val modules = Seq( + KafkaProcessorUserModificationModule, + DeciderModule + ) + + override protected def setup(): Unit = {} + + override protected def start(): Unit = { + val processor = injector.instance[AtLeastOnceProcessor[UnKeyed, UserModification]] + closeOnExit(processor) + processor.start() + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/BUILD b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/BUILD new file mode 100644 index 000000000..9586c637d --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/BUILD @@ -0,0 +1,482 @@ +scala_library( + name = "decider-utils", + sources = [ + "DeciderUtils.scala", + "TopicsMapping.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + "decider/src/main/scala", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + ], +) + +scala_library( + name = "base", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "ZoneFiltering.scala", + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "tls-favs", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorTlsFavsModule.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "src/thrift/com/twitter/timelineservice/server/internal:thrift-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tls_favs_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "client-event", + sources = [ + "FlagsModule.scala", + "KafkaProcessorClientEventModule.scala", + "KafkaProcessorProvider.scala", + "TopicsMapping.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/client_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + + +scala_library( + name = "tweetypie-event", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorTweetypieEventModule.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/tweetypie_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "social-graph-event", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorSocialGraphModule.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/social_graph_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "email-notification-event", + sources = [ + "FlagsModule.scala", + "KafkaProcessorEmailNotificationEventModule.scala", + "KafkaProcessorProvider.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/email_notification_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "user-modification-event", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorUserModificationModule.scala", + "ZoneFiltering.scala", + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/user_modification_event", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "ads-callback-engagements", + sources = [ + "FlagsModule.scala", + "KafkaProcessorAdsCallbackEngagementsModule.scala", + "KafkaProcessorProvider.scala", + "ZoneFiltering.scala", + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/ads_callback_engagements", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "favorite-archival-events", + sources = [ + "FlagsModule.scala", + "KafkaProcessorFavoriteArchivalEventsModule.scala", + "KafkaProcessorProvider.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "src/thrift/com/twitter/timelineservice/server/internal:thrift-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/favorite_archival_events", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "retweet-archival-events", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorRetweetArchivalEventsModule.scala", + "ZoneFiltering.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/retweet_archival_events", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "rekey-uua", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorRekeyUuaModule.scala", + "ZoneFiltering.scala", + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "rekey-uua-iesource", + sources = [ + "FlagsModule.scala", + "KafkaProcessorProvider.scala", + "KafkaProcessorRekeyUuaIesourceModule.scala", + "ZoneFiltering.scala", + ], + tags = [ + "bazel-compatible", + "bazel-only", + ], + dependencies = [ + ":decider-utils", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finatra-internal/kafka/src/main/scala/com/twitter/finatra/kafka/consumers", + "finatra-internal/mtls-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "finatra/inject/inject-thrift-client/src/main/scala", + "kafka/finagle-kafka/finatra-kafka/src/main/scala", + "kafka/libs/src/main/scala/com/twitter/kafka/client/processor", + "twitter-server/server/src/main/scala", + "unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/uua_aggregates", + "unified_user_actions/kafka/src/main/scala/com/twitter/unified_user_actions/kafka", + "unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala", + "util/util-core:scala", + "util/util-core/src/main/scala/com/twitter/conversions", + "util/util-slf4j-api/src/main/scala", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) + +scala_library( + name = "graphql-client", + sources = [ + "ClientIdModule.scala", + "GraphqlClientProviderModule.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication", + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/client", + "finagle/finagle-thriftmux/src/main/scala", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "twitter-server/server/src/main/scala", + ], +) + +scala_library( + name = "cache", + sources = [ + "CacheModule.scala", + ], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/google/guava", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "featureswitches/dynmap/src/main/scala/com/twitter/dynmap:dynmap-core", + "finatra/inject/inject-app/src/main/java/com/twitter/inject/annotations", + "finatra/inject/inject-modules/src/main/scala", + "finatra/inject/inject-modules/src/main/scala/com/twitter/inject/modules", + "graphql/thrift/src/main/thrift/com/twitter/graphql:graphql-scala", + "twitter-server/server/src/main/scala", + "unified_user_actions/enricher/src/main/scala/com/twitter/unified_user_actions/enricher/hcache", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "util/util-cache-guava/src/main/scala", + "util/util-cache/src/main/scala", + ], +) diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/CacheModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/CacheModule.scala new file mode 100644 index 000000000..295c6ee39 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/CacheModule.scala @@ -0,0 +1,48 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.common.cache.CacheBuilder +import com.google.inject.Provides +import com.twitter.dynmap.DynMap +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.unified_user_actions.enricher.hcache.LocalCache +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.util.Future +import java.util.concurrent.TimeUnit +import javax.inject.Singleton + +object CacheModule extends TwitterModule { + private final val localCacheTtlFlagName = "local.cache.ttl.seconds" + private final val localCacheMaxSizeFlagName = "local.cache.max.size" + + flag[Long]( + name = localCacheTtlFlagName, + default = 1800L, + help = "Local Cache's TTL in seconds" + ) + + flag[Long]( + name = localCacheMaxSizeFlagName, + default = 1000L, + help = "Local Cache's max size" + ) + + @Provides + @Singleton + def providesLocalCache( + @Flag(localCacheTtlFlagName) localCacheTtlFlag: Long, + @Flag(localCacheMaxSizeFlagName) localCacheMaxSizeFlag: Long, + statsReceiver: StatsReceiver + ): LocalCache[EnrichmentKey, DynMap] = { + val underlying = CacheBuilder + .newBuilder() + .expireAfterWrite(localCacheTtlFlag, TimeUnit.SECONDS) + .maximumSize(localCacheMaxSizeFlag) + .build[EnrichmentKey, Future[DynMap]]() + + new LocalCache[EnrichmentKey, DynMap]( + underlying = underlying, + statsReceiver = statsReceiver.scope("enricherLocalCache")) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ClientIdModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ClientIdModule.scala new file mode 100644 index 000000000..9358834b9 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ClientIdModule.scala @@ -0,0 +1,24 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.finagle.thrift.ClientId +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import javax.inject.Singleton + +object ClientIdModule extends TwitterModule { + private final val flagName = "thrift.client.id" + + flag[String]( + name = flagName, + help = "Thrift Client ID" + ) + + @Provides + @Singleton + def providesClientId( + @Flag(flagName) thriftClientId: String, + ): ClientId = ClientId( + name = thriftClientId + ) +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/DeciderUtils.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/DeciderUtils.scala new file mode 100644 index 000000000..f38a9ef92 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/DeciderUtils.scala @@ -0,0 +1,27 @@ +package com.twitter.unified_user_actions.service.module + +import com.twitter.decider.Decider +import com.twitter.decider.RandomRecipient +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction + +sealed trait DeciderUtils { + def shouldPublish(decider: Decider, uua: UnifiedUserAction, sinkTopic: String): Boolean +} + +object DefaultDeciderUtils extends DeciderUtils { + override def shouldPublish(decider: Decider, uua: UnifiedUserAction, sinkTopic: String): Boolean = + decider.isAvailable(feature = s"Publish${uua.actionType}", Some(RandomRecipient)) +} + +object ClientEventDeciderUtils extends DeciderUtils { + override def shouldPublish(decider: Decider, uua: UnifiedUserAction, sinkTopic: String): Boolean = + decider.isAvailable( + feature = s"Publish${uua.actionType}", + Some(RandomRecipient)) && (uua.actionType match { + // for heavy impressions UUA only publishes to the "all" topic, not the engagementsOnly topic. + case ActionType.ClientTweetLingerImpression | ActionType.ClientTweetRenderImpression => + sinkTopic == TopicsMapping().all + case _ => true + }) +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/FlagsModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/FlagsModule.scala new file mode 100644 index 000000000..62cb09825 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/FlagsModule.scala @@ -0,0 +1,172 @@ +package com.twitter.unified_user_actions.service.module + +import com.twitter.inject.TwitterModule +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging + +object FlagsModule extends TwitterModule with Logging { + // Twitter + final val cluster = "cluster" + + // Required + final val kafkaSourceCluster = ClientConfigs.kafkaBootstrapServerConfig + final val kafkaDestCluster = ClientConfigs.kafkaBootstrapServerRemoteDestConfig + final val kafkaSourceTopic = "kafka.source.topic" + final val kafkaSinkTopics = "kafka.sink.topics" + final val kafkaGroupId = ClientConfigs.kafkaGroupIdConfig + final val kafkaProducerClientId = ClientConfigs.producerClientIdConfig + final val kafkaMaxPendingRequests = ClientConfigs.kafkaMaxPendingRequestsConfig + final val kafkaWorkerThreads = ClientConfigs.kafkaWorkerThreadsConfig + + // Optional + /// Authentication + final val enableTrustStore = ClientConfigs.enableTrustStore + final val trustStoreLocation = ClientConfigs.trustStoreLocationConfig + + /// Consumer + final val commitInterval = ClientConfigs.kafkaCommitIntervalConfig + final val maxPollRecords = ClientConfigs.consumerMaxPollRecordsConfig + final val maxPollInterval = ClientConfigs.consumerMaxPollIntervalConfig + final val sessionTimeout = ClientConfigs.consumerSessionTimeoutConfig + final val fetchMax = ClientConfigs.consumerFetchMaxConfig + final val fetchMin = ClientConfigs.consumerFetchMinConfig + final val receiveBuffer = ClientConfigs.consumerReceiveBufferSizeConfig + /// Producer + final val batchSize = ClientConfigs.producerBatchSizeConfig + final val linger = ClientConfigs.producerLingerConfig + final val bufferMem = ClientConfigs.producerBufferMemConfig + final val compressionType = ClientConfigs.compressionConfig + final val retries = ClientConfigs.retriesConfig + final val retryBackoff = ClientConfigs.retryBackoffConfig + final val requestTimeout = ClientConfigs.producerRequestTimeoutConfig + + // Twitter + flag[String]( + name = cluster, + help = "The zone (or DC) that this service runs, used to potentially filter events" + ) + + // Required + flag[String]( + name = kafkaSourceCluster, + help = ClientConfigs.kafkaBootstrapServerHelp + ) + flag[String]( + name = kafkaDestCluster, + help = ClientConfigs.kafkaBootstrapServerRemoteDestHelp + ) + flag[String]( + name = kafkaSourceTopic, + help = "Name of the source Kafka topic" + ) + flag[Seq[String]]( + name = kafkaSinkTopics, + help = "A list of sink Kafka topics, separated by comma (,)" + ) + flag[String]( + name = kafkaGroupId, + help = ClientConfigs.kafkaGroupIdHelp + ) + flag[String]( + name = kafkaProducerClientId, + help = ClientConfigs.producerClientIdHelp + ) + flag[Int]( + name = kafkaMaxPendingRequests, + help = ClientConfigs.kafkaMaxPendingRequestsHelp + ) + flag[Int]( + name = kafkaWorkerThreads, + help = ClientConfigs.kafkaWorkerThreadsHelp + ) + + // Optional + /// Authentication + flag[Boolean]( + name = enableTrustStore, + default = ClientConfigs.enableTrustStoreDefault, + help = ClientConfigs.enableTrustStoreHelp + ) + flag[String]( + name = trustStoreLocation, + default = ClientConfigs.trustStoreLocationDefault, + help = ClientConfigs.trustStoreLocationHelp + ) + + /// Consumer + flag[Duration]( + name = commitInterval, + default = ClientConfigs.kafkaCommitIntervalDefault, + help = ClientConfigs.kafkaCommitIntervalHelp + ) + flag[Int]( + name = maxPollRecords, + default = ClientConfigs.consumerMaxPollRecordsDefault, + help = ClientConfigs.consumerMaxPollRecordsHelp + ) + flag[Duration]( + name = maxPollInterval, + default = ClientConfigs.consumerMaxPollIntervalDefault, + help = ClientConfigs.consumerMaxPollIntervalHelp + ) + flag[Duration]( + name = sessionTimeout, + default = ClientConfigs.consumerSessionTimeoutDefault, + help = ClientConfigs.consumerSessionTimeoutHelp + ) + flag[StorageUnit]( + name = fetchMax, + default = ClientConfigs.consumerFetchMaxDefault, + help = ClientConfigs.consumerFetchMaxHelp + ) + flag[StorageUnit]( + name = fetchMin, + default = ClientConfigs.consumerFetchMinDefault, + help = ClientConfigs.consumerFetchMinHelp + ) + flag[StorageUnit]( + name = receiveBuffer, + default = ClientConfigs.consumerReceiveBufferSizeDefault, + help = ClientConfigs.consumerReceiveBufferSizeHelp + ) + + /// Producer + flag[StorageUnit]( + name = batchSize, + default = ClientConfigs.producerBatchSizeDefault, + help = ClientConfigs.producerBatchSizeHelp + ) + flag[Duration]( + name = linger, + default = ClientConfigs.producerLingerDefault, + help = ClientConfigs.producerLingerHelp + ) + flag[StorageUnit]( + name = bufferMem, + default = ClientConfigs.producerBufferMemDefault, + help = ClientConfigs.producerBufferMemHelp + ) + flag[CompressionTypeFlag]( + name = compressionType, + default = ClientConfigs.compressionDefault, + help = ClientConfigs.compressionHelp + ) + flag[Int]( + name = retries, + default = ClientConfigs.retriesDefault, + help = ClientConfigs.retriesHelp + ) + flag[Duration]( + name = retryBackoff, + default = ClientConfigs.retryBackoffDefault, + help = ClientConfigs.retryBackoffHelp + ) + flag[Duration]( + name = requestTimeout, + default = ClientConfigs.producerRequestTimeoutDefault, + help = ClientConfigs.producerRequestTimeoutHelp + ) +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/GraphqlClientProviderModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/GraphqlClientProviderModule.scala new file mode 100644 index 000000000..6a9973655 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/GraphqlClientProviderModule.scala @@ -0,0 +1,42 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.finagle.ThriftMux +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax +import com.twitter.finagle.ssl.OpportunisticTls +import com.twitter.finagle.thrift.ClientId +import com.twitter.finagle.thrift.RichClientParam +import com.twitter.graphql.thriftscala.GraphqlExecutionService +import com.twitter.inject.TwitterModule +import com.twitter.util.Duration +import javax.inject.Singleton + +object GraphqlClientProviderModule extends TwitterModule { + private def buildClient(serviceIdentifier: ServiceIdentifier, clientId: ClientId) = + ThriftMux.client + .withRequestTimeout(Duration.fromSeconds(5)) + .withMutualTls(serviceIdentifier) + .withOpportunisticTls(OpportunisticTls.Required) + .withClientId(clientId) + .newService("/s/graphql-service/graphql-api:thrift") + + def buildGraphQlClient( + serviceIdentifer: ServiceIdentifier, + clientId: ClientId + ): GraphqlExecutionService.FinagledClient = { + val client = buildClient(serviceIdentifer, clientId) + new GraphqlExecutionService.FinagledClient(client, RichClientParam()) + } + + @Provides + @Singleton + def providesGraphQlClient( + serviceIdentifier: ServiceIdentifier, + clientId: ClientId + ): GraphqlExecutionService.FinagledClient = + buildGraphQlClient( + serviceIdentifier, + clientId + ) +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorAdsCallbackEngagementsModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorAdsCallbackEngagementsModule.scala new file mode 100644 index 000000000..404eafa23 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorAdsCallbackEngagementsModule.scala @@ -0,0 +1,87 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.ads.spendserver.thriftscala.SpendServerEvent +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.ads_callback_engagements.AdsCallbackEngagementsAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorAdsCallbackEngagementsModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, SpendServerEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[SpendServerEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = new AdsCallbackEngagementsAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorClientEventModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorClientEventModule.scala new file mode 100644 index 000000000..b6f36589c --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorClientEventModule.scala @@ -0,0 +1,142 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.producers.BlockingFinagleKafkaProducer +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.headers.Zone +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.client_event.ClientEventAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.unified_user_actions.service.module.KafkaProcessorProvider.updateActionTypeCounters +import com.twitter.unified_user_actions.service.module.KafkaProcessorProvider.updateProcessingTimeStats +import com.twitter.unified_user_actions.service.module.KafkaProcessorProvider.updateProductSurfaceTypeCounters +import com.twitter.unified_user_actions.thriftscala.ActionType +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.common.header.Headers + +object KafkaProcessorClientEventModule extends TwitterModule with Logging { + override def modules: Seq[FlagsModule.type] = Seq(FlagsModule) + + private val clientEventAdapter = new ClientEventAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.fetchMin) fetchMin: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, LogEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[LogEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + fetchMin = fetchMin, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = clientEventAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + produceOpt = Some(clientEventProducer), + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } + + /** + * ClientEvent producer is different from the defaultProducer. + * While the defaultProducer publishes every event to all sink topics, ClientEventProducer (this producer) requires + * exactly 2 sink topics: Topic with all events (impressions and engagements) and Topic with engagements only. + * And the publishing is based the action type. + */ + def clientEventProducer( + producer: BlockingFinagleKafkaProducer[UnKeyed, UnifiedUserAction], + k: UnKeyed, + v: UnifiedUserAction, + sinkTopic: String, + headers: Headers, + statsReceiver: StatsReceiver, + decider: Decider + ): Future[Unit] = + if (ClientEventDeciderUtils.shouldPublish(decider = decider, uua = v, sinkTopic = sinkTopic)) { + updateActionTypeCounters(statsReceiver, v, sinkTopic) + updateProductSurfaceTypeCounters(statsReceiver, v, sinkTopic) + updateProcessingTimeStats(statsReceiver, v) + + // If we were to enable xDC replicator, then we can safely remove the Zone header since xDC + // replicator works in the following way: + // - If the message does not have a header, the replicator will assume it is local and + // set the header, copy the message + // - If the message has a header that is the local zone, the replicator will copy the message + // - If the message has a header for a different zone, the replicator will drop the message + producer + .send( + new ProducerRecord[UnKeyed, UnifiedUserAction]( + sinkTopic, + null, + k, + v, + headers.remove(Zone.Key))) + .onSuccess { _ => statsReceiver.counter("publishSuccess", sinkTopic).incr() } + .onFailure { e: Throwable => + statsReceiver.counter("publishFailure", sinkTopic).incr() + error(s"Publish error to topic $sinkTopic: $e") + }.unit + } else Future.Unit +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorEmailNotificationEventModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorEmailNotificationEventModule.scala new file mode 100644 index 000000000..116792b7e --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorEmailNotificationEventModule.scala @@ -0,0 +1,88 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.ibis.thriftscala.NotificationScribe +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.unified_user_actions.adapter.email_notification_event.EmailNotificationEventAdapter +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorEmailNotificationEventModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + private val notificationEventAdapter = new EmailNotificationEventAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, NotificationScribe] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[NotificationScribe](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = notificationEventAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + maybeProcess = ZoneFiltering.localDCFiltering + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorFavoriteArchivalEventsModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorFavoriteArchivalEventsModule.scala new file mode 100644 index 000000000..3e8f5592b --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorFavoriteArchivalEventsModule.scala @@ -0,0 +1,88 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.favorite_archival_events.FavoriteArchivalEventsAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.timelineservice.fanout.thriftscala.FavoriteArchivalEvent +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorFavoriteArchivalEventsModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val adapter = new FavoriteArchivalEventsAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, FavoriteArchivalEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[FavoriteArchivalEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = adapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorProvider.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorProvider.scala new file mode 100644 index 000000000..da8ad39f9 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorProvider.scala @@ -0,0 +1,271 @@ +package com.twitter.unified_user_actions.service.module + +import com.twitter.decider.Decider +import com.twitter.finagle.stats.Counter +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.producers.BlockingFinagleKafkaProducer +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.kafka.client.headers.Implicits._ +import com.twitter.kafka.client.headers.Zone +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.kafka.ClientProviders +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.common.header.Headers +import org.apache.kafka.common.record.CompressionType +import org.apache.kafka.common.serialization.Deserializer + +object KafkaProcessorProvider extends Logging { + lazy val actionTypeStatsCounterMap: collection.mutable.Map[String, Counter] = + collection.mutable.Map.empty + lazy val productSurfaceTypeStatsCounterMap: collection.mutable.Map[String, Counter] = + collection.mutable.Map.empty + + def updateActionTypeCounters( + statsReceiver: StatsReceiver, + v: UnifiedUserAction, + topic: String + ): Unit = { + val actionType = v.actionType.name + val actionTypeAndTopicKey = s"$actionType-$topic" + actionTypeStatsCounterMap.get(actionTypeAndTopicKey) match { + case Some(actionCounter) => actionCounter.incr() + case _ => + actionTypeStatsCounterMap(actionTypeAndTopicKey) = + statsReceiver.counter("uuaActionType", topic, actionType) + actionTypeStatsCounterMap(actionTypeAndTopicKey).incr() + } + } + + def updateProductSurfaceTypeCounters( + statsReceiver: StatsReceiver, + v: UnifiedUserAction, + topic: String + ): Unit = { + val productSurfaceType = v.productSurface.map(_.name).getOrElse("null") + val productSurfaceTypeAndTopicKey = s"$productSurfaceType-$topic" + productSurfaceTypeStatsCounterMap.get(productSurfaceTypeAndTopicKey) match { + case Some(productSurfaceCounter) => productSurfaceCounter.incr() + case _ => + productSurfaceTypeStatsCounterMap(productSurfaceTypeAndTopicKey) = + statsReceiver.counter("uuaProductSurfaceType", topic, productSurfaceType) + productSurfaceTypeStatsCounterMap(productSurfaceTypeAndTopicKey).incr() + } + } + + def updateProcessingTimeStats(statsReceiver: StatsReceiver, v: UnifiedUserAction): Unit = { + statsReceiver + .stat("uuaProcessingTimeDiff").add( + v.eventMetadata.receivedTimestampMs - v.eventMetadata.sourceTimestampMs) + } + + def defaultProducer( + producer: BlockingFinagleKafkaProducer[UnKeyed, UnifiedUserAction], + k: UnKeyed, + v: UnifiedUserAction, + sinkTopic: String, + headers: Headers, + statsReceiver: StatsReceiver, + decider: Decider, + ): Future[Unit] = + if (DefaultDeciderUtils.shouldPublish(decider = decider, uua = v, sinkTopic = sinkTopic)) { + updateActionTypeCounters(statsReceiver, v, sinkTopic) + updateProcessingTimeStats(statsReceiver, v) + + // If we were to enable xDC replicator, then we can safely remove the Zone header since xDC + // replicator works in the following way: + // - If the message does not have a header, the replicator will assume it is local and + // set the header, copy the message + // - If the message has a header that is the local zone, the replicator will copy the message + // - If the message has a header for a different zone, the replicator will drop the message + producer + .send( + new ProducerRecord[UnKeyed, UnifiedUserAction]( + sinkTopic, + null, + k, + v, + headers.remove(Zone.Key))) + .onSuccess { _ => statsReceiver.counter("publishSuccess", sinkTopic).incr() } + .onFailure { e: Throwable => + statsReceiver.counter("publishFailure", sinkTopic).incr() + error(s"Publish error to topic $sinkTopic: $e") + }.unit + } else Future.Unit + + /** + * The default AtLeastOnceProcessor mainly for consuming from a single Kafka topic -> process/adapt -> publish to + * the single sink Kafka topic. + * + * Important Note: Currently all sink topics share the same Kafka producer!!! If you need to create different + * producers for different topics, you would need to create a customized function like this one. + */ + def provideDefaultAtLeastOnceProcessor[K, V]( + name: String, + kafkaSourceCluster: String, + kafkaGroupId: String, + kafkaSourceTopic: String, + sourceKeyDeserializer: Deserializer[K], + sourceValueDeserializer: Deserializer[V], + commitInterval: Duration = ClientConfigs.kafkaCommitIntervalDefault, + maxPollRecords: Int = ClientConfigs.consumerMaxPollRecordsDefault, + maxPollInterval: Duration = ClientConfigs.consumerMaxPollIntervalDefault, + sessionTimeout: Duration = ClientConfigs.consumerSessionTimeoutDefault, + fetchMax: StorageUnit = ClientConfigs.consumerFetchMaxDefault, + fetchMin: StorageUnit = ClientConfigs.consumerFetchMinDefault, + receiveBuffer: StorageUnit = ClientConfigs.consumerReceiveBufferSizeDefault, + processorMaxPendingRequests: Int, + processorWorkerThreads: Int, + adapter: AbstractAdapter[V, UnKeyed, UnifiedUserAction], + kafkaSinkTopics: Seq[String], + kafkaDestCluster: String, + kafkaProducerClientId: String, + batchSize: StorageUnit = ClientConfigs.producerBatchSizeDefault, + linger: Duration = ClientConfigs.producerLingerDefault, + bufferMem: StorageUnit = ClientConfigs.producerBufferMemDefault, + compressionType: CompressionType = ClientConfigs.compressionDefault.compressionType, + retries: Int = ClientConfigs.retriesDefault, + retryBackoff: Duration = ClientConfigs.retryBackoffDefault, + requestTimeout: Duration = ClientConfigs.producerRequestTimeoutDefault, + produceOpt: Option[ + (BlockingFinagleKafkaProducer[UnKeyed, UnifiedUserAction], UnKeyed, UnifiedUserAction, String, + Headers, StatsReceiver, Decider) => Future[Unit] + ] = None, + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault), + statsReceiver: StatsReceiver, + decider: Decider, + zone: Zone, + maybeProcess: (ConsumerRecord[K, V], Zone) => Boolean = ZoneFiltering.localDCFiltering[K, V] _, + ): AtLeastOnceProcessor[K, V] = { + + lazy val singletonProducer = ClientProviders.mkProducer[UnKeyed, UnifiedUserAction]( + bootstrapServer = kafkaDestCluster, + clientId = kafkaProducerClientId, + keySerde = UnKeyedSerde.serializer, + valueSerde = ScalaSerdes.Thrift[UnifiedUserAction].serializer, + idempotence = false, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + trustStoreLocationOpt = trustStoreLocationOpt, + ) + + mkAtLeastOnceProcessor[K, V, UnKeyed, UnifiedUserAction]( + name = name, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = sourceKeyDeserializer, + sourceValueDeserializer = sourceValueDeserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + fetchMin = fetchMin, + receiveBuffer = receiveBuffer, + processorMaxPendingRequests = processorMaxPendingRequests, + processorWorkerThreads = processorWorkerThreads, + adapter = adapter, + kafkaProducersAndSinkTopics = + kafkaSinkTopics.map(sinkTopic => (singletonProducer, sinkTopic)), + produce = produceOpt.getOrElse(defaultProducer), + trustStoreLocationOpt = trustStoreLocationOpt, + statsReceiver = statsReceiver, + decider = decider, + zone = zone, + maybeProcess = maybeProcess, + ) + } + + /** + * A common AtLeastOnceProcessor provider + */ + def mkAtLeastOnceProcessor[K, V, OUTK, OUTV]( + name: String, + kafkaSourceCluster: String, + kafkaGroupId: String, + kafkaSourceTopic: String, + sourceKeyDeserializer: Deserializer[K], + sourceValueDeserializer: Deserializer[V], + commitInterval: Duration = ClientConfigs.kafkaCommitIntervalDefault, + maxPollRecords: Int = ClientConfigs.consumerMaxPollRecordsDefault, + maxPollInterval: Duration = ClientConfigs.consumerMaxPollIntervalDefault, + sessionTimeout: Duration = ClientConfigs.consumerSessionTimeoutDefault, + fetchMax: StorageUnit = ClientConfigs.consumerFetchMaxDefault, + fetchMin: StorageUnit = ClientConfigs.consumerFetchMinDefault, + receiveBuffer: StorageUnit = ClientConfigs.consumerReceiveBufferSizeDefault, + processorMaxPendingRequests: Int, + processorWorkerThreads: Int, + adapter: AbstractAdapter[V, OUTK, OUTV], + kafkaProducersAndSinkTopics: Seq[(BlockingFinagleKafkaProducer[OUTK, OUTV], String)], + produce: (BlockingFinagleKafkaProducer[OUTK, OUTV], OUTK, OUTV, String, Headers, StatsReceiver, + Decider) => Future[Unit], + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault), + statsReceiver: StatsReceiver, + decider: Decider, + zone: Zone, + maybeProcess: (ConsumerRecord[K, V], Zone) => Boolean = ZoneFiltering.localDCFiltering[K, V] _, + ): AtLeastOnceProcessor[K, V] = { + val threadSafeKafkaClient = + ClientProviders.mkConsumer[K, V]( + bootstrapServer = kafkaSourceCluster, + keySerde = sourceKeyDeserializer, + valueSerde = sourceValueDeserializer, + groupId = kafkaGroupId, + autoCommit = false, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + fetchMin = fetchMin, + receiveBuffer = receiveBuffer, + trustStoreLocationOpt = trustStoreLocationOpt + ) + + def publish( + event: ConsumerRecord[K, V] + ): Future[Unit] = { + statsReceiver.counter("consumedEvents").incr() + + if (maybeProcess(event, zone)) + Future + .collect( + adapter + .adaptOneToKeyedMany(event.value, statsReceiver) + .flatMap { + case (k, v) => + kafkaProducersAndSinkTopics.map { + case (producer, sinkTopic) => + produce(producer, k, v, sinkTopic, event.headers(), statsReceiver, decider) + } + }).unit + else + Future.Unit + } + + AtLeastOnceProcessor[K, V]( + name = name, + topic = kafkaSourceTopic, + consumer = threadSafeKafkaClient, + processor = publish, + maxPendingRequests = processorMaxPendingRequests, + workerThreads = processorWorkerThreads, + commitIntervalMs = commitInterval.inMilliseconds, + statsReceiver = statsReceiver + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaIesourceModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaIesourceModule.scala new file mode 100644 index 000000000..466fbec0c --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaIesourceModule.scala @@ -0,0 +1,207 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.decider.SimpleRecipient +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.producers.BlockingFinagleKafkaProducer +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.iesource.thriftscala.InteractionEvent +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.headers.Zone +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.uua_aggregates.RekeyUuaFromInteractionEventsAdapter +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.kafka.ClientProviders +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.thriftscala.KeyedUuaTweet +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.common.header.Headers +import org.apache.kafka.common.record.CompressionType +import javax.inject.Singleton +import javax.inject.Inject + +object KafkaProcessorRekeyUuaIesourceModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val adapter = new RekeyUuaFromInteractionEventsAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + @Inject + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.receiveBuffer) receiveBuffer: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, InteractionEvent] = { + provideAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + receiveBuffer = receiveBuffer, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = adapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + maybeProcess = ZoneFiltering.noFiltering + ) + } + + def producer( + producer: BlockingFinagleKafkaProducer[Long, KeyedUuaTweet], + k: Long, + v: KeyedUuaTweet, + sinkTopic: String, + headers: Headers, + statsReceiver: StatsReceiver, + decider: Decider, + ): Future[Unit] = + if (decider.isAvailable(feature = s"RekeyUUAIesource${v.actionType}", Some(SimpleRecipient(k)))) + // If we were to enable xDC replicator, then we can safely remove the Zone header since xDC + // replicator works in the following way: + // - If the message does not have a header, the replicator will assume it is local and + // set the header, copy the message + // - If the message has a header that is the local zone, the replicator will copy the message + // - If the message has a header for a different zone, the replicator will drop the message + producer + .send(new ProducerRecord[Long, KeyedUuaTweet](sinkTopic, null, k, v, headers)) + .onSuccess { _ => statsReceiver.counter("publishSuccess", sinkTopic).incr() } + .onFailure { e: Throwable => + statsReceiver.counter("publishFailure", sinkTopic).incr() + error(s"Publish error to topic $sinkTopic: $e") + }.unit + else Future.Unit + + def provideAtLeastOnceProcessor( + name: String, + kafkaSourceCluster: String, + kafkaGroupId: String, + kafkaSourceTopic: String, + commitInterval: Duration = ClientConfigs.kafkaCommitIntervalDefault, + maxPollRecords: Int = ClientConfigs.consumerMaxPollRecordsDefault, + maxPollInterval: Duration = ClientConfigs.consumerMaxPollIntervalDefault, + sessionTimeout: Duration = ClientConfigs.consumerSessionTimeoutDefault, + fetchMax: StorageUnit = ClientConfigs.consumerFetchMaxDefault, + fetchMin: StorageUnit = ClientConfigs.consumerFetchMinDefault, + receiveBuffer: StorageUnit = ClientConfigs.consumerReceiveBufferSizeDefault, + processorMaxPendingRequests: Int, + processorWorkerThreads: Int, + adapter: AbstractAdapter[InteractionEvent, Long, KeyedUuaTweet], + kafkaSinkTopics: Seq[String], + kafkaDestCluster: String, + kafkaProducerClientId: String, + batchSize: StorageUnit = ClientConfigs.producerBatchSizeDefault, + linger: Duration = ClientConfigs.producerLingerDefault, + bufferMem: StorageUnit = ClientConfigs.producerBufferMemDefault, + compressionType: CompressionType = ClientConfigs.compressionDefault.compressionType, + retries: Int = ClientConfigs.retriesDefault, + retryBackoff: Duration = ClientConfigs.retryBackoffDefault, + requestTimeout: Duration = ClientConfigs.producerRequestTimeoutDefault, + produceOpt: Option[ + (BlockingFinagleKafkaProducer[Long, KeyedUuaTweet], Long, KeyedUuaTweet, String, Headers, + StatsReceiver, Decider) => Future[Unit] + ] = Some(producer), + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault), + statsReceiver: StatsReceiver, + decider: Decider, + zone: Zone, + maybeProcess: (ConsumerRecord[UnKeyed, InteractionEvent], Zone) => Boolean, + ): AtLeastOnceProcessor[UnKeyed, InteractionEvent] = { + + lazy val singletonProducer = ClientProviders.mkProducer[Long, KeyedUuaTweet]( + bootstrapServer = kafkaDestCluster, + clientId = kafkaProducerClientId, + keySerde = ScalaSerdes.Long.serializer, + valueSerde = ScalaSerdes.Thrift[KeyedUuaTweet].serializer, + idempotence = false, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + trustStoreLocationOpt = trustStoreLocationOpt, + ) + + KafkaProcessorProvider.mkAtLeastOnceProcessor[UnKeyed, InteractionEvent, Long, KeyedUuaTweet]( + name = name, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = ScalaSerdes.CompactThrift[InteractionEvent].deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + fetchMin = fetchMin, + receiveBuffer = receiveBuffer, + processorMaxPendingRequests = processorMaxPendingRequests, + processorWorkerThreads = processorWorkerThreads, + adapter = adapter, + kafkaProducersAndSinkTopics = + kafkaSinkTopics.map(sinkTopic => (singletonProducer, sinkTopic)), + produce = produceOpt.getOrElse(producer), + trustStoreLocationOpt = trustStoreLocationOpt, + statsReceiver = statsReceiver, + decider = decider, + zone = zone, + maybeProcess = maybeProcess, + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaModule.scala new file mode 100644 index 000000000..3b961fabb --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRekeyUuaModule.scala @@ -0,0 +1,203 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.decider.SimpleRecipient +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.producers.BlockingFinagleKafkaProducer +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.common.header.Headers +import org.apache.kafka.common.record.CompressionType +import com.twitter.kafka.client.headers.Zone +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.AbstractAdapter +import com.twitter.unified_user_actions.adapter.uua_aggregates.RekeyUuaAdapter +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.kafka.ClientProviders +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.unified_user_actions.thriftscala.KeyedUuaTweet +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorRekeyUuaModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val adapter = new RekeyUuaAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, UnifiedUserAction] = { + provideAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = adapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + maybeProcess = ZoneFiltering.noFiltering + ) + } + + def producer( + producer: BlockingFinagleKafkaProducer[Long, KeyedUuaTweet], + k: Long, + v: KeyedUuaTweet, + sinkTopic: String, + headers: Headers, + statsReceiver: StatsReceiver, + decider: Decider, + ): Future[Unit] = + if (decider.isAvailable(feature = s"RekeyUUA${v.actionType}", Some(SimpleRecipient(k)))) + // If we were to enable xDC replicator, then we can safely remove the Zone header since xDC + // replicator works in the following way: + // - If the message does not have a header, the replicator will assume it is local and + // set the header, copy the message + // - If the message has a header that is the local zone, the replicator will copy the message + // - If the message has a header for a different zone, the replicator will drop the message + producer + .send(new ProducerRecord[Long, KeyedUuaTweet](sinkTopic, null, k, v, headers)) + .onSuccess { _ => statsReceiver.counter("publishSuccess", sinkTopic).incr() } + .onFailure { e: Throwable => + statsReceiver.counter("publishFailure", sinkTopic).incr() + error(s"Publish error to topic $sinkTopic: $e") + }.unit + else Future.Unit + + def provideAtLeastOnceProcessor[K, V]( + name: String, + kafkaSourceCluster: String, + kafkaGroupId: String, + kafkaSourceTopic: String, + commitInterval: Duration = ClientConfigs.kafkaCommitIntervalDefault, + maxPollRecords: Int = ClientConfigs.consumerMaxPollRecordsDefault, + maxPollInterval: Duration = ClientConfigs.consumerMaxPollIntervalDefault, + sessionTimeout: Duration = ClientConfigs.consumerSessionTimeoutDefault, + fetchMax: StorageUnit = ClientConfigs.consumerFetchMaxDefault, + fetchMin: StorageUnit = ClientConfigs.consumerFetchMinDefault, + processorMaxPendingRequests: Int, + processorWorkerThreads: Int, + adapter: AbstractAdapter[UnifiedUserAction, Long, KeyedUuaTweet], + kafkaSinkTopics: Seq[String], + kafkaDestCluster: String, + kafkaProducerClientId: String, + batchSize: StorageUnit = ClientConfigs.producerBatchSizeDefault, + linger: Duration = ClientConfigs.producerLingerDefault, + bufferMem: StorageUnit = ClientConfigs.producerBufferMemDefault, + compressionType: CompressionType = ClientConfigs.compressionDefault.compressionType, + retries: Int = ClientConfigs.retriesDefault, + retryBackoff: Duration = ClientConfigs.retryBackoffDefault, + requestTimeout: Duration = ClientConfigs.producerRequestTimeoutDefault, + produceOpt: Option[ + (BlockingFinagleKafkaProducer[Long, KeyedUuaTweet], Long, KeyedUuaTweet, String, Headers, + StatsReceiver, Decider) => Future[Unit] + ] = Some(producer), + trustStoreLocationOpt: Option[String] = Some(ClientConfigs.trustStoreLocationDefault), + statsReceiver: StatsReceiver, + decider: Decider, + zone: Zone, + maybeProcess: (ConsumerRecord[UnKeyed, UnifiedUserAction], Zone) => Boolean, + ): AtLeastOnceProcessor[UnKeyed, UnifiedUserAction] = { + + lazy val singletonProducer = ClientProviders.mkProducer[Long, KeyedUuaTweet]( + bootstrapServer = kafkaDestCluster, + clientId = kafkaProducerClientId, + keySerde = ScalaSerdes.Long.serializer, + valueSerde = ScalaSerdes.Thrift[KeyedUuaTweet].serializer, + idempotence = false, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + trustStoreLocationOpt = trustStoreLocationOpt, + ) + + KafkaProcessorProvider.mkAtLeastOnceProcessor[UnKeyed, UnifiedUserAction, Long, KeyedUuaTweet]( + name = name, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[UnifiedUserAction](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + fetchMin = fetchMin, + processorMaxPendingRequests = processorMaxPendingRequests, + processorWorkerThreads = processorWorkerThreads, + adapter = adapter, + kafkaProducersAndSinkTopics = + kafkaSinkTopics.map(sinkTopic => (singletonProducer, sinkTopic)), + produce = produceOpt.getOrElse(producer), + trustStoreLocationOpt = trustStoreLocationOpt, + statsReceiver = statsReceiver, + decider = decider, + zone = zone, + maybeProcess = maybeProcess, + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRetweetArchivalEventsModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRetweetArchivalEventsModule.scala new file mode 100644 index 000000000..b3bdc2fda --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorRetweetArchivalEventsModule.scala @@ -0,0 +1,88 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.tweetypie.thriftscala.RetweetArchivalEvent +import com.twitter.unified_user_actions.adapter.retweet_archival_events.RetweetArchivalEventsAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorRetweetArchivalEventsModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val adapter = new RetweetArchivalEventsAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, RetweetArchivalEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[RetweetArchivalEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = adapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorSocialGraphModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorSocialGraphModule.scala new file mode 100644 index 000000000..f9d734490 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorSocialGraphModule.scala @@ -0,0 +1,90 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.socialgraph.thriftscala.WriteEvent +import com.twitter.unified_user_actions.adapter.social_graph_event.SocialGraphAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +class KafkaProcessorSocialGraphModule {} + +object KafkaProcessorSocialGraphModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val socialGraphAdapter = new SocialGraphAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, WriteEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[WriteEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = socialGraphAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTlsFavsModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTlsFavsModule.scala new file mode 100644 index 000000000..65970d333 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTlsFavsModule.scala @@ -0,0 +1,89 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.annotations.Flag +import com.twitter.inject.TwitterModule +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.timelineservice.thriftscala.ContextualizedFavoriteEvent +import com.twitter.unified_user_actions.adapter.tls_favs_event.TlsFavsAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorTlsFavsModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + private val tlsFavsAdapter = new TlsFavsAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, ContextualizedFavoriteEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[ContextualizedFavoriteEvent]( + statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = tlsFavsAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTweetypieEventModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTweetypieEventModule.scala new file mode 100644 index 000000000..d4a9b7e58 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorTweetypieEventModule.scala @@ -0,0 +1,90 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.tweetypie.thriftscala.TweetEvent +import com.twitter.unified_user_actions.adapter.tweetypie_event.TweetypieEventAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorTweetypieEventModule extends TwitterModule with Logging { + override def modules: Seq[inject.Module] = Seq(FlagsModule) + + private val tweetypieEventAdapter = new TweetypieEventAdapter + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, TweetEvent] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[TweetEvent](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = tweetypieEventAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } + +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorUserModificationModule.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorUserModificationModule.scala new file mode 100644 index 000000000..0e17fa9f2 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/KafkaProcessorUserModificationModule.scala @@ -0,0 +1,87 @@ +package com.twitter.unified_user_actions.service.module + +import com.google.inject.Provides +import com.twitter.decider.Decider +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.gizmoduck.thriftscala.UserModification +import com.twitter.inject.TwitterModule +import com.twitter.inject.annotations.Flag +import com.twitter.kafka.client.processor.AtLeastOnceProcessor +import com.twitter.unified_user_actions.adapter.user_modification.UserModificationAdapter +import com.twitter.unified_user_actions.kafka.CompressionTypeFlag +import com.twitter.unified_user_actions.kafka.serde.NullableScalaSerdes +import com.twitter.util.Duration +import com.twitter.util.StorageUnit +import com.twitter.util.logging.Logging +import javax.inject.Singleton + +object KafkaProcessorUserModificationModule extends TwitterModule with Logging { + override def modules = Seq(FlagsModule) + + // NOTE: This is a shared processor name in order to simplify monviz stat computation. + private final val processorName = "uuaProcessor" + + @Provides + @Singleton + def providesKafkaProcessor( + decider: Decider, + @Flag(FlagsModule.cluster) cluster: String, + @Flag(FlagsModule.kafkaSourceCluster) kafkaSourceCluster: String, + @Flag(FlagsModule.kafkaDestCluster) kafkaDestCluster: String, + @Flag(FlagsModule.kafkaSourceTopic) kafkaSourceTopic: String, + @Flag(FlagsModule.kafkaSinkTopics) kafkaSinkTopics: Seq[String], + @Flag(FlagsModule.kafkaGroupId) kafkaGroupId: String, + @Flag(FlagsModule.kafkaProducerClientId) kafkaProducerClientId: String, + @Flag(FlagsModule.kafkaMaxPendingRequests) kafkaMaxPendingRequests: Int, + @Flag(FlagsModule.kafkaWorkerThreads) kafkaWorkerThreads: Int, + @Flag(FlagsModule.commitInterval) commitInterval: Duration, + @Flag(FlagsModule.maxPollRecords) maxPollRecords: Int, + @Flag(FlagsModule.maxPollInterval) maxPollInterval: Duration, + @Flag(FlagsModule.sessionTimeout) sessionTimeout: Duration, + @Flag(FlagsModule.fetchMax) fetchMax: StorageUnit, + @Flag(FlagsModule.batchSize) batchSize: StorageUnit, + @Flag(FlagsModule.linger) linger: Duration, + @Flag(FlagsModule.bufferMem) bufferMem: StorageUnit, + @Flag(FlagsModule.compressionType) compressionTypeFlag: CompressionTypeFlag, + @Flag(FlagsModule.retries) retries: Int, + @Flag(FlagsModule.retryBackoff) retryBackoff: Duration, + @Flag(FlagsModule.requestTimeout) requestTimeout: Duration, + @Flag(FlagsModule.enableTrustStore) enableTrustStore: Boolean, + @Flag(FlagsModule.trustStoreLocation) trustStoreLocation: String, + statsReceiver: StatsReceiver, + ): AtLeastOnceProcessor[UnKeyed, UserModification] = { + KafkaProcessorProvider.provideDefaultAtLeastOnceProcessor( + name = processorName, + kafkaSourceCluster = kafkaSourceCluster, + kafkaGroupId = kafkaGroupId, + kafkaSourceTopic = kafkaSourceTopic, + sourceKeyDeserializer = UnKeyedSerde.deserializer, + sourceValueDeserializer = NullableScalaSerdes + .Thrift[UserModification](statsReceiver.counter("deserializerErrors")).deserializer, + commitInterval = commitInterval, + maxPollRecords = maxPollRecords, + maxPollInterval = maxPollInterval, + sessionTimeout = sessionTimeout, + fetchMax = fetchMax, + processorMaxPendingRequests = kafkaMaxPendingRequests, + processorWorkerThreads = kafkaWorkerThreads, + adapter = new UserModificationAdapter, + kafkaSinkTopics = kafkaSinkTopics, + kafkaDestCluster = kafkaDestCluster, + kafkaProducerClientId = kafkaProducerClientId, + batchSize = batchSize, + linger = linger, + bufferMem = bufferMem, + compressionType = compressionTypeFlag.compressionType, + retries = retries, + retryBackoff = retryBackoff, + requestTimeout = requestTimeout, + statsReceiver = statsReceiver, + trustStoreLocationOpt = if (enableTrustStore) Some(trustStoreLocation) else None, + decider = decider, + zone = ZoneFiltering.zoneMapping(cluster), + ) + } +} diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/TopicsMapping.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/TopicsMapping.scala new file mode 100644 index 000000000..8959118f9 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/TopicsMapping.scala @@ -0,0 +1,5 @@ +package com.twitter.unified_user_actions.service.module + +case class TopicsMapping( + all: String = "unified_user_actions", + engagementsOnly: String = "unified_user_actions_engagements") diff --git a/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ZoneFiltering.scala b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ZoneFiltering.scala new file mode 100644 index 000000000..3da3e80d6 --- /dev/null +++ b/unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service/module/ZoneFiltering.scala @@ -0,0 +1,22 @@ +package com.twitter.unified_user_actions.service.module + +import com.twitter.kafka.client.headers.ATLA +import com.twitter.kafka.client.headers.Implicits._ +import com.twitter.kafka.client.headers.PDXA +import com.twitter.kafka.client.headers.Zone +import org.apache.kafka.clients.consumer.ConsumerRecord + +object ZoneFiltering { + def zoneMapping(zone: String): Zone = zone.toLowerCase match { + case "atla" => ATLA + case "pdxa" => PDXA + case _ => + throw new IllegalArgumentException( + s"zone must be provided and must be one of [atla,pdxa], provided $zone") + } + + def localDCFiltering[K, V](event: ConsumerRecord[K, V], localZone: Zone): Boolean = + event.headers().isLocalZone(localZone) + + def noFiltering[K, V](event: ConsumerRecord[K, V], localZone: Zone): Boolean = true +} diff --git a/unified_user_actions/service/src/test/resources/BUILD.bazel b/unified_user_actions/service/src/test/resources/BUILD.bazel new file mode 100644 index 000000000..ae9669f4f --- /dev/null +++ b/unified_user_actions/service/src/test/resources/BUILD.bazel @@ -0,0 +1,4 @@ +resources( + sources = ["*.*"], + tags = ["bazel-compatible"], +) diff --git a/unified_user_actions/service/src/test/resources/decider.yml b/unified_user_actions/service/src/test/resources/decider.yml new file mode 100644 index 000000000..604217f37 --- /dev/null +++ b/unified_user_actions/service/src/test/resources/decider.yml @@ -0,0 +1,6 @@ +PublishServerTweetFav: + default_availability: 10000 +RekeyUUAIesourceClientTweetRenderImpression: + default_availability: 10000 +EnrichmentPlannerSampling: + default_availability: 10000 diff --git a/unified_user_actions/service/src/test/resources/logback.xml b/unified_user_actions/service/src/test/resources/logback.xml new file mode 100644 index 000000000..27f50b1dc --- /dev/null +++ b/unified_user_actions/service/src/test/resources/logback.xml @@ -0,0 +1,45 @@ + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/BUILD.bazel b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/BUILD.bazel new file mode 100644 index 000000000..7b42c4c0f --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/BUILD.bazel @@ -0,0 +1,21 @@ +junit_tests( + name = "tests", + sources = ["*.scala"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/javax/inject:javax.inject", + "decider/src/main/scala", + "kafka/finagle-kafka/finatra-kafka-streams/kafka-streams/src/test/scala:test-deps", + "kafka/finagle-kafka/finatra-kafka/src/test/scala:test-deps", + "unified_user_actions/enricher/src/main/thrift/com/twitter/unified_user_actions/enricher/internal:internal-scala", + "unified_user_actions/enricher/src/test/scala/com/twitter/unified_user_actions/enricher:fixture", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:client-event", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:enrichment-planner", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:rekey-uua-iesource", + "unified_user_actions/service/src/main/scala/com/twitter/unified_user_actions/service:tls-favs", + "unified_user_actions/service/src/test/resources", + "util/util-mock/src/main/scala/com/twitter/util/mock", + ], +) diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ClientEventServiceStartupTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ClientEventServiceStartupTest.scala new file mode 100644 index 000000000..0503d8782 --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ClientEventServiceStartupTest.scala @@ -0,0 +1,141 @@ +package com.twitter.unified_user_actions.service + +import com.google.inject.Stage +import com.twitter.app.GlobalFlag +import com.twitter.clientapp.thriftscala.EventDetails +import com.twitter.clientapp.thriftscala.EventNamespace +import com.twitter.clientapp.thriftscala.Item +import com.twitter.clientapp.thriftscala.ItemType +import com.twitter.clientapp.thriftscala.LogEvent +import com.twitter.finatra.kafka.consumers.FinagleKafkaConsumerBuilder +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.domain.KafkaTopic +import com.twitter.finatra.kafka.domain.SeekStrategy +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerBuilder +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.finatra.kafka.test.KafkaFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.kafka.client.processor.KafkaConsumerClient +import com.twitter.logbase.thriftscala.LogBase +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.service.module.KafkaProcessorClientEventModule +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Duration +import com.twitter.util.StorageUnit + +class ClientEventServiceStartupTest extends KafkaFeatureTest { + private val inputTopic = + kafkaTopic(UnKeyedSerde, ScalaSerdes.Thrift[LogEvent], name = "source") + private val outputTopic = + kafkaTopic(UnKeyedSerde, ScalaSerdes.Thrift[UnifiedUserAction], name = "sink") + + val startupFlags = Map( + "kafka.group.id" -> "client-event", + "kafka.producer.client.id" -> "uua", + "kafka.source.topic" -> inputTopic.topic, + "kafka.sink.topics" -> outputTopic.topic, + "kafka.consumer.fetch.min" -> "6.megabytes", + "kafka.max.pending.requests" -> "100", + "kafka.worker.threads" -> "1", + "kafka.trust.store.enable" -> "false", + "kafka.producer.batch.size" -> "0.byte", + "cluster" -> "atla", + ) + + val deciderFlags = Map( + "decider.base" -> "/decider.yml" + ) + + override protected def kafkaBootstrapFlag: Map[String, String] = { + Map( + ClientConfigs.kafkaBootstrapServerConfig -> kafkaCluster.bootstrapServers(), + ClientConfigs.kafkaBootstrapServerRemoteDestConfig -> kafkaCluster.bootstrapServers(), + ) + } + + override val server: EmbeddedTwitterServer = new EmbeddedTwitterServer( + twitterServer = new ClientEventService() { + override def warmup(): Unit = { + // noop + } + + override val overrideModules = Seq( + KafkaProcessorClientEventModule + ) + }, + globalFlags = Map[GlobalFlag[_], String]( + com.twitter.finatra.kafka.consumers.enableTlsAndKerberos -> "false", + ), + flags = startupFlags ++ kafkaBootstrapFlag ++ deciderFlags, + stage = Stage.PRODUCTION + ) + + private def getConsumer( + seekStrategy: SeekStrategy = SeekStrategy.BEGINNING, + ) = { + val builder = FinagleKafkaConsumerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId("consumer") + .groupId(KafkaGroupId("validator")) + .keyDeserializer(UnKeyedSerde.deserializer) + .valueDeserializer(ScalaSerdes.Thrift[LogEvent].deserializer) + .requestTimeout(Duration.fromSeconds(1)) + .enableAutoCommit(false) + .seekStrategy(seekStrategy) + + new KafkaConsumerClient(builder.config) + } + + private def getProducer(clientId: String = "producer") = { + FinagleKafkaProducerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId(clientId) + .ackMode(AckMode.ALL) + .batchSize(StorageUnit.zero) + .keySerializer(UnKeyedSerde.serializer) + .valueSerializer(ScalaSerdes.Thrift[LogEvent].serializer) + .build() + } + + test("ClientEventService starts") { + server.assertHealthy() + } + + test("ClientEventService should process input events") { + val producer = getProducer() + val inputConsumer = getConsumer() + + val value: LogEvent = LogEvent( + eventName = "test_tweet_render_impression_event", + eventNamespace = + Some(EventNamespace(component = Some("stream"), element = None, action = Some("results"))), + eventDetails = Some( + EventDetails( + items = Some( + Seq[Item]( + Item(id = Some(1L), itemType = Some(ItemType.Tweet)) + )) + )), + logBase = Some(LogBase(timestamp = 10001L, transactionId = "", ipAddress = "")) + ) + + try { + server.assertHealthy() + + // before, should be empty + inputConsumer.subscribe(Set(KafkaTopic(inputTopic.topic))) + assert(inputConsumer.poll().count() == 0) + + // after, should contain at least a message + await(producer.send(inputTopic.topic, new UnKeyed, value, System.currentTimeMillis)) + producer.flush() + assert(inputConsumer.poll().count() >= 1) + } finally { + await(producer.close()) + inputConsumer.close() + } + } +} diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/DeciderUtilsTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/DeciderUtilsTest.scala new file mode 100644 index 000000000..f5a0af48c --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/DeciderUtilsTest.scala @@ -0,0 +1,75 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.decider.MockDecider +import com.twitter.inject.Test +import com.twitter.unified_user_actions.service.module.ClientEventDeciderUtils +import com.twitter.unified_user_actions.service.module.DefaultDeciderUtils +import com.twitter.unified_user_actions.thriftscala._ +import com.twitter.util.Time +import com.twitter.util.mock.Mockito +import org.junit.runner.RunWith +import org.scalatestplus.junit.JUnitRunner + +@RunWith(classOf[JUnitRunner]) +class DeciderUtilsTest extends Test with Mockito { + trait Fixture { + val frozenTime = Time.fromMilliseconds(1658949273000L) + + val publishActionTypes = + Set[ActionType](ActionType.ServerTweetFav, ActionType.ClientTweetRenderImpression) + + def decider( + features: Set[String] = publishActionTypes.map { action => + s"Publish${action.name}" + } + ) = new MockDecider(features = features) + + def mkUUA(actionType: ActionType) = UnifiedUserAction( + userIdentifier = UserIdentifier(userId = Some(91L)), + item = Item.TweetInfo( + TweetInfo( + actionTweetId = 1L, + actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(101L))), + ) + ), + actionType = actionType, + eventMetadata = EventMetadata( + sourceTimestampMs = 1001L, + receivedTimestampMs = frozenTime.inMilliseconds, + sourceLineage = SourceLineage.ServerTlsFavs, + traceId = Some(31L) + ) + ) + + val uuaServerTweetFav = mkUUA(ActionType.ServerTweetFav) + val uuaClientTweetFav = mkUUA(ActionType.ClientTweetFav) + val uuaClientTweetRenderImpression = mkUUA(ActionType.ClientTweetRenderImpression) + } + + test("Decider Utils") { + new Fixture { + Time.withTimeAt(frozenTime) { _ => + DefaultDeciderUtils.shouldPublish( + decider = decider(), + uua = uuaServerTweetFav, + sinkTopic = "") shouldBe true + DefaultDeciderUtils.shouldPublish( + decider = decider(), + uua = uuaClientTweetFav, + sinkTopic = "") shouldBe false + ClientEventDeciderUtils.shouldPublish( + decider = decider(), + uua = uuaClientTweetRenderImpression, + sinkTopic = "unified_user_actions_engagements") shouldBe false + ClientEventDeciderUtils.shouldPublish( + decider = decider(), + uua = uuaClientTweetFav, + sinkTopic = "unified_user_actions_engagements") shouldBe false + ClientEventDeciderUtils.shouldPublish( + decider = decider(features = Set[String](s"Publish${ActionType.ClientTweetFav.name}")), + uua = uuaClientTweetFav, + sinkTopic = "unified_user_actions_engagements") shouldBe true + } + } + } +} diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerServiceTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerServiceTest.scala new file mode 100644 index 000000000..a1038e1b2 --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/EnrichmentPlannerServiceTest.scala @@ -0,0 +1,141 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.finatra.kafka.test.EmbeddedKafka +import com.twitter.finatra.kafkastreams.test.FinatraTopologyTester +import com.twitter.finatra.kafkastreams.test.TopologyFeatureTest +import com.twitter.unified_user_actions.enricher.EnricherFixture +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentEnvelop +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentIdType +import com.twitter.unified_user_actions.enricher.internal.thriftscala.EnrichmentKey +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.joda.time.DateTime + +/** + * This is to test the logic where the service reads and outputs to the same Kafka cluster + */ +class EnrichmentPlannerServiceTest extends TopologyFeatureTest { + val startTime = new DateTime("2022-10-01T00:00:00Z") + + override protected lazy val topologyTester: FinatraTopologyTester = FinatraTopologyTester( + "enrichment-planner-tester", + new EnrichmentPlannerService, + startingWallClockTime = startTime, + flags = Map( + "decider.base" -> "/decider.yml", + "kafka.output.server" -> "" + ) + ) + + private val inputTopic = topologyTester.topic( + name = EnrichmentPlannerServiceMain.InputTopic, + keySerde = UnKeyedSerde, + valSerde = ScalaSerdes.Thrift[UnifiedUserAction] + ) + + private val outputTopic = topologyTester.topic( + name = EnrichmentPlannerServiceMain.OutputPartitionedTopic, + keySerde = ScalaSerdes.Thrift[EnrichmentKey], + valSerde = ScalaSerdes.Thrift[EnrichmentEnvelop] + ) + + test("can filter unsupported events") { + new EnricherFixture { + (1L to 10L).foreach(id => { + inputTopic.pipeInput(UnKeyed, mkUUAProfileEvent(id)) + }) + + assert(outputTopic.readAllOutput().size === 0) + } + } + + test("partition key serialization should be correct") { + val key = EnrichmentKey(EnrichmentIdType.TweetId, 9999L) + val serializer = ScalaSerdes.Thrift[EnrichmentKey].serializer + + val actual = serializer.serialize("test", key) + val expected = Array[Byte](8, 0, 1, 0, 0, 0, 0, 10, 0, 2, 0, 0, 0, 0, 0, 0, 39, 15, 0) + + assert(actual.deep === expected.deep) + } + + test("partitioned enrichment tweet event is constructed correctly") { + new EnricherFixture { + val expected = mkUUATweetEvent(888L) + inputTopic.pipeInput(UnKeyed, expected) + + val actual = outputTopic.readAllOutput().head + + assert(actual.key() === EnrichmentKey(EnrichmentIdType.TweetId, 888L)) + assert( + actual + .value() === EnrichmentEnvelop( + expected.hashCode, + expected, + plan = tweetInfoEnrichmentPlan + )) + } + } + + test("partitioned enrichment tweet notification event is constructed correctly") { + new EnricherFixture { + val expected = mkUUATweetNotificationEvent(8989L) + inputTopic.pipeInput(UnKeyed, expected) + + val actual = outputTopic.readAllOutput().head + + assert(actual.key() === EnrichmentKey(EnrichmentIdType.TweetId, 8989L)) + assert( + actual + .value() === EnrichmentEnvelop( + expected.hashCode, + expected, + plan = tweetNotificationEnrichmentPlan + )) + } + } +} + +/** + * This is tests the bootstrap server logic in prod. Don't add any new tests here since it is slow. + * Use the tests above which is much quicker to be executed and and test the majority of prod logic. + */ +class EnrichmentPlannerServiceEmbeddedKafkaTest extends TopologyFeatureTest with EmbeddedKafka { + val startTime = new DateTime("2022-10-01T00:00:00Z") + + override protected lazy val topologyTester: FinatraTopologyTester = FinatraTopologyTester( + "enrichment-planner-tester", + new EnrichmentPlannerService, + startingWallClockTime = startTime, + flags = Map( + "decider.base" -> "/decider.yml", + "kafka.output.server" -> kafkaCluster.bootstrapServers(), + "kafka.output.enable.tls" -> "false" + ) + ) + + private lazy val inputTopic = topologyTester.topic( + name = EnrichmentPlannerServiceMain.InputTopic, + keySerde = UnKeyedSerde, + valSerde = ScalaSerdes.Thrift[UnifiedUserAction] + ) + + private val outputTopic = kafkaTopic( + name = EnrichmentPlannerServiceMain.OutputPartitionedTopic, + keySerde = ScalaSerdes.Thrift[EnrichmentKey], + valSerde = ScalaSerdes.Thrift[EnrichmentEnvelop] + ) + + test("toCluster should output to expected topic & embeded cluster") { + new EnricherFixture { + inputTopic.pipeInput(UnKeyed, mkUUATweetEvent(tweetId = 1)) + val records: Seq[ConsumerRecord[Array[Byte], Array[Byte]]] = outputTopic.consumeRecords(1) + + assert(records.size === 1) + assert(records.head.topic() == EnrichmentPlannerServiceMain.OutputPartitionedTopic) + } + } +} diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceServiceStartupTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceServiceStartupTest.scala new file mode 100644 index 000000000..9609a2691 --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/RekeyUuaIesourceServiceStartupTest.scala @@ -0,0 +1,173 @@ +package com.twitter.unified_user_actions.service + +import com.google.inject.Stage +import com.twitter.adserver.thriftscala.DisplayLocation +import com.twitter.app.GlobalFlag +import com.twitter.finatra.kafka.consumers.FinagleKafkaConsumerBuilder +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.domain.KafkaTopic +import com.twitter.finatra.kafka.domain.SeekStrategy +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerBuilder +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.finatra.kafka.test.KafkaFeatureTest +import com.twitter.iesource.thriftscala.ClientEventContext +import com.twitter.iesource.thriftscala.TweetImpression +import com.twitter.iesource.thriftscala.ClientType +import com.twitter.iesource.thriftscala.ContextualEventNamespace +import com.twitter.iesource.thriftscala.EngagingContext +import com.twitter.iesource.thriftscala.EventSource +import com.twitter.iesource.thriftscala.InteractionDetails +import com.twitter.iesource.thriftscala.InteractionEvent +import com.twitter.iesource.thriftscala.InteractionType +import com.twitter.iesource.thriftscala.InteractionTargetType +import com.twitter.iesource.thriftscala.UserIdentifier +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.kafka.client.processor.KafkaConsumerClient +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.service.module.KafkaProcessorRekeyUuaIesourceModule +import com.twitter.unified_user_actions.thriftscala.KeyedUuaTweet +import com.twitter.util.Duration +import com.twitter.util.StorageUnit + +class RekeyUuaIesourceServiceStartupTest extends KafkaFeatureTest { + private val inputTopic = + kafkaTopic(ScalaSerdes.Long, ScalaSerdes.CompactThrift[InteractionEvent], name = "source") + private val outputTopic = + kafkaTopic(ScalaSerdes.Long, ScalaSerdes.Thrift[KeyedUuaTweet], name = "sink") + + val startupFlags = Map( + "kafka.group.id" -> "client-event", + "kafka.producer.client.id" -> "uua", + "kafka.source.topic" -> inputTopic.topic, + "kafka.sink.topics" -> outputTopic.topic, + "kafka.consumer.fetch.min" -> "6.megabytes", + "kafka.max.pending.requests" -> "100", + "kafka.worker.threads" -> "1", + "kafka.trust.store.enable" -> "false", + "kafka.producer.batch.size" -> "0.byte", + "cluster" -> "atla", + ) + + val deciderFlags = Map( + "decider.base" -> "/decider.yml" + ) + + override protected def kafkaBootstrapFlag: Map[String, String] = { + Map( + ClientConfigs.kafkaBootstrapServerConfig -> kafkaCluster.bootstrapServers(), + ClientConfigs.kafkaBootstrapServerRemoteDestConfig -> kafkaCluster.bootstrapServers(), + ) + } + + override val server: EmbeddedTwitterServer = new EmbeddedTwitterServer( + twitterServer = new RekeyUuaIesourceService() { + override def warmup(): Unit = { + // noop + } + + override val overrideModules = Seq( + KafkaProcessorRekeyUuaIesourceModule + ) + }, + globalFlags = Map[GlobalFlag[_], String]( + com.twitter.finatra.kafka.consumers.enableTlsAndKerberos -> "false", + ), + flags = startupFlags ++ kafkaBootstrapFlag ++ deciderFlags, + stage = Stage.PRODUCTION + ) + + private def getConsumer( + seekStrategy: SeekStrategy = SeekStrategy.BEGINNING, + ) = { + val builder = FinagleKafkaConsumerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId("consumer") + .groupId(KafkaGroupId("validator")) + .keyDeserializer(ScalaSerdes.Long.deserializer) + .valueDeserializer(ScalaSerdes.CompactThrift[InteractionEvent].deserializer) + .requestTimeout(Duration.fromSeconds(1)) + .enableAutoCommit(false) + .seekStrategy(seekStrategy) + + new KafkaConsumerClient(builder.config) + } + + private def getUUAConsumer( + seekStrategy: SeekStrategy = SeekStrategy.BEGINNING, + ) = { + val builder = FinagleKafkaConsumerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId("consumer_uua") + .groupId(KafkaGroupId("validator_uua")) + .keyDeserializer(UnKeyedSerde.deserializer) + .valueDeserializer(ScalaSerdes.Thrift[KeyedUuaTweet].deserializer) + .requestTimeout(Duration.fromSeconds(1)) + .enableAutoCommit(false) + .seekStrategy(seekStrategy) + + new KafkaConsumerClient(builder.config) + } + + private def getProducer(clientId: String = "producer") = { + FinagleKafkaProducerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId(clientId) + .ackMode(AckMode.ALL) + .batchSize(StorageUnit.zero) + .keySerializer(ScalaSerdes.Long.serializer) + .valueSerializer(ScalaSerdes.CompactThrift[InteractionEvent].serializer) + .build() + } + + test("RekeyUuaIesourceService starts") { + server.assertHealthy() + } + + test("RekeyUuaIesourceService should process input events") { + val producer = getProducer() + val inputConsumer = getConsumer() + val uuaConsumer = getUUAConsumer() + + val value: InteractionEvent = InteractionEvent( + targetId = 1L, + targetType = InteractionTargetType.Tweet, + engagingUserId = 11L, + eventSource = EventSource.ClientEvent, + timestampMillis = 123456L, + interactionType = Some(InteractionType.TweetRenderImpression), + details = InteractionDetails.TweetRenderImpression(TweetImpression()), + additionalEngagingUserIdentifiers = UserIdentifier(), + engagingContext = EngagingContext.ClientEventContext( + ClientEventContext( + clientEventNamespace = ContextualEventNamespace(), + clientType = ClientType.Iphone, + displayLocation = DisplayLocation(1))) + ) + + try { + server.assertHealthy() + + // before, should be empty + inputConsumer.subscribe(Set(KafkaTopic(inputTopic.topic))) + assert(inputConsumer.poll().count() == 0) + + // after, should contain at least a message + await(producer.send(inputTopic.topic, value.targetId, value, System.currentTimeMillis)) + producer.flush() + assert(inputConsumer.poll().count() == 1) + + uuaConsumer.subscribe(Set(KafkaTopic(outputTopic.topic))) + // This is tricky: it is not guaranteed that the srvice can process and output the + // event to output topic faster than the below consumer. So we'd use a timer here which may + // not be the best practice. + // If someone finds the below test is flaky, please just remove the below test completely. + Thread.sleep(5000L) + assert(uuaConsumer.poll().count() == 1) + } finally { + await(producer.close()) + inputConsumer.close() + } + } +} diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/TlsFavServiceStartupTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/TlsFavServiceStartupTest.scala new file mode 100644 index 000000000..8c0615ea8 --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/TlsFavServiceStartupTest.scala @@ -0,0 +1,153 @@ +package com.twitter.unified_user_actions.service + +import com.google.inject.Stage +import com.twitter.app.GlobalFlag +import com.twitter.finatra.kafka.consumers.FinagleKafkaConsumerBuilder +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.domain.KafkaGroupId +import com.twitter.finatra.kafka.domain.KafkaTopic +import com.twitter.finatra.kafka.domain.SeekStrategy +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerBuilder +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafka.serde.UnKeyedSerde +import com.twitter.finatra.kafka.test.KafkaFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.kafka.client.processor.KafkaConsumerClient +import com.twitter.timelineservice.thriftscala.ContextualizedFavoriteEvent +import com.twitter.timelineservice.thriftscala.FavoriteEvent +import com.twitter.timelineservice.thriftscala.FavoriteEventUnion +import com.twitter.timelineservice.thriftscala.LogEventContext +import com.twitter.unified_user_actions.kafka.ClientConfigs +import com.twitter.unified_user_actions.service.module.KafkaProcessorTlsFavsModule +import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction +import com.twitter.util.Duration +import com.twitter.util.StorageUnit + +class TlsFavServiceStartupTest extends KafkaFeatureTest { + private val inputTopic = + kafkaTopic(UnKeyedSerde, ScalaSerdes.Thrift[ContextualizedFavoriteEvent], name = "source") + private val outputTopic = + kafkaTopic(UnKeyedSerde, ScalaSerdes.Thrift[UnifiedUserAction], name = "sink") + + val startupFlags = Map( + "kafka.group.id" -> "tls", + "kafka.producer.client.id" -> "uua", + "kafka.source.topic" -> inputTopic.topic, + "kafka.sink.topics" -> outputTopic.topic, + "kafka.max.pending.requests" -> "100", + "kafka.worker.threads" -> "1", + "kafka.trust.store.enable" -> "false", + "kafka.producer.batch.size" -> "0.byte", + "cluster" -> "atla", + ) + + val deciderFlags = Map( + "decider.base" -> "/decider.yml" + ) + + override protected def kafkaBootstrapFlag: Map[String, String] = { + Map( + ClientConfigs.kafkaBootstrapServerConfig -> kafkaCluster.bootstrapServers(), + ClientConfigs.kafkaBootstrapServerRemoteDestConfig -> kafkaCluster.bootstrapServers(), + ) + } + + override val server: EmbeddedTwitterServer = new EmbeddedTwitterServer( + twitterServer = new TlsFavsService() { + override def warmup(): Unit = { + // noop + } + + override val overrideModules = Seq( + KafkaProcessorTlsFavsModule + ) + }, + globalFlags = Map[GlobalFlag[_], String]( + com.twitter.finatra.kafka.consumers.enableTlsAndKerberos -> "false", + ), + flags = startupFlags ++ kafkaBootstrapFlag ++ deciderFlags, + stage = Stage.PRODUCTION + ) + + private def getConsumer( + seekStrategy: SeekStrategy = SeekStrategy.BEGINNING, + ) = { + val builder = FinagleKafkaConsumerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId("consumer") + .groupId(KafkaGroupId("validator")) + .keyDeserializer(UnKeyedSerde.deserializer) + .valueDeserializer(ScalaSerdes.Thrift[ContextualizedFavoriteEvent].deserializer) + .requestTimeout(Duration.fromSeconds(1)) + .enableAutoCommit(false) + .seekStrategy(seekStrategy) + + new KafkaConsumerClient(builder.config) + } + + private def getProducer(clientId: String = "producer") = { + FinagleKafkaProducerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId(clientId) + .ackMode(AckMode.ALL) + .batchSize(StorageUnit.zero) + .keySerializer(UnKeyedSerde.serializer) + .valueSerializer(ScalaSerdes.Thrift[ContextualizedFavoriteEvent].serializer) + .build() + } + + private def getUUAConsumer( + seekStrategy: SeekStrategy = SeekStrategy.BEGINNING, + ) = { + val builder = FinagleKafkaConsumerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .clientId("consumer_uua") + .groupId(KafkaGroupId("validator_uua")) + .keyDeserializer(UnKeyedSerde.deserializer) + .valueDeserializer(ScalaSerdes.Thrift[UnifiedUserAction].deserializer) + .requestTimeout(Duration.fromSeconds(1)) + .enableAutoCommit(false) + .seekStrategy(seekStrategy) + + new KafkaConsumerClient(builder.config) + } + + test("TlsFavService starts") { + server.assertHealthy() + } + + test("TlsFavService should process input events") { + val producer = getProducer() + val inputConsumer = getConsumer() + val uuaConsumer = getUUAConsumer() + + val favoriteEvent = FavoriteEventUnion.Favorite(FavoriteEvent(123L, 123L, 123L, 123L)) + val value = + ContextualizedFavoriteEvent(favoriteEvent, LogEventContext("localhost", 123L)) + + try { + server.assertHealthy() + + // before, should be empty + inputConsumer.subscribe(Set(KafkaTopic(inputTopic.topic))) + assert(inputConsumer.poll().count() == 0) + + // after, should contain at least a message + await(producer.send(inputTopic.topic, new UnKeyed, value, System.currentTimeMillis)) + producer.flush() + assert(inputConsumer.poll().count() == 1) + + uuaConsumer.subscribe(Set(KafkaTopic(outputTopic.topic))) + // This is tricky: it is not guaranteed that the TlsFavsService can process and output the + // event to output topic faster than the below consumer. So we'd use a timer here which may + // not be the best practice. + // If someone finds the below test is flaky, please just remove the below test completely. + Thread.sleep(5000L) + assert(uuaConsumer.poll().count() == 1) + } finally { + await(producer.close()) + inputConsumer.close() + } + } +} diff --git a/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ZoneFilteringTest.scala b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ZoneFilteringTest.scala new file mode 100644 index 000000000..02019fa6d --- /dev/null +++ b/unified_user_actions/service/src/test/scala/com/twitter/unified_user_actions/service/ZoneFilteringTest.scala @@ -0,0 +1,50 @@ +package com.twitter.unified_user_actions.service + +import com.twitter.inject.Test +import com.twitter.kafka.client.headers.ATLA +import com.twitter.kafka.client.headers.Implicits._ +import com.twitter.kafka.client.headers.PDXA +import com.twitter.kafka.client.headers.Zone +import com.twitter.unified_user_actions.service.module.ZoneFiltering +import com.twitter.util.mock.Mockito +import org.apache.kafka.clients.consumer.ConsumerRecord +import org.junit.runner.RunWith +import org.scalatestplus.junit.JUnitRunner +import org.scalatest.prop.TableDrivenPropertyChecks + +@RunWith(classOf[JUnitRunner]) +class ZoneFilteringTest extends Test with Mockito with TableDrivenPropertyChecks { + trait Fixture { + val consumerRecord = + new ConsumerRecord[Array[Byte], Array[Byte]]("topic", 0, 0l, Array(0), Array(0)) + } + + test("two DCs filter") { + val zones = Table( + "zone", + Some(ATLA), + Some(PDXA), + None + ) + forEvery(zones) { localZoneOpt: Option[Zone] => + forEvery(zones) { headerZoneOpt: Option[Zone] => + localZoneOpt.foreach { localZone => + new Fixture { + headerZoneOpt match { + case Some(headerZone) => + consumerRecord.headers().setZone(headerZone) + if (headerZone == ATLA && localZone == ATLA) + ZoneFiltering.localDCFiltering(consumerRecord, localZone) shouldBe true + else if (headerZone == PDXA && localZone == PDXA) + ZoneFiltering.localDCFiltering(consumerRecord, localZone) shouldBe true + else + ZoneFiltering.localDCFiltering(consumerRecord, localZone) shouldBe false + case _ => + ZoneFiltering.localDCFiltering(consumerRecord, localZone) shouldBe true + } + } + } + } + } + } +} diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/BUILD b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/BUILD new file mode 100644 index 000000000..a605859d2 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/BUILD @@ -0,0 +1,20 @@ +create_thrift_libraries( + org = "com.twitter", + base_name = "unified_user_actions", + sources = ["*.thrift"], + tags = ["bazel-compatible"], + dependency_roots = [ + "src/thrift/com/twitter/clientapp/gen:clientapp", + "src/thrift/com/twitter/gizmoduck:thrift", + "src/thrift/com/twitter/gizmoduck:user-thrift", + "src/thrift/com/twitter/search/common:constants", + "src/thrift/com/twitter/socialgraph:thrift", + ], + generate_languages = [ + "java", + "scala", + "strato", + ], + provides_java_name = "unified_user_actions-thrift-java", + provides_scala_name = "unified_user_actions-thrift-scala", +) diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/action_info.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/action_info.thrift new file mode 100644 index 000000000..1342b5cf7 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/action_info.thrift @@ -0,0 +1,957 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +include "com/twitter/clientapp/gen/client_app.thrift" +include "com/twitter/reportflow/report_flow_logs.thrift" +include "com/twitter/socialgraph/social_graph_service_write_log.thrift" +include "com/twitter/gizmoduck/user_service.thrift" + +/* + * ActionType is typically a three part enum consisting of + * [Origin][Item Type][Action Name] + * + * [Origin] is usually "client" or "server" to indicate how the action was derived. + * + * [Item Type] is singular and refers to the shorthand version of the type of + * Item (e.g. Tweet, Profile, Notification instead of TweetInfo, ProfileInfo, NotificationInfo) + * the action occurred on. Action types and item types should be 1:1, and when an action can be + * performed on multiple types of items, consider granular action types. + * + * [Action Name] is the descriptive name of the user action (e.g. favorite, render impression); + * action names should correspond to UI actions / ML labels (which are typically based on user + * behavior from UI actions) + * + * Below are guidelines around naming of action types: + * a) When an action is coupled to a product surface, be concise in naming such that the + * combination of item type and action name captures the user behavior for the action in the UI. For example, + * for an open on a Notification in the PushNotification product surface that is parsed from client events, + * consider ClientNotificationOpen because the item Notification and the action name Open concisely represent + * the action, and the product surface PushNotification can be identified independently. + * + * b) It is OK to use generic names like Click if needed to distinguish from another action OR + * it is the best way to characterize an action concisely without confusion. + * For example, for ClientTweetClickReply, this refers to actually clicking on the Reply button but not + * Replying, and it is OK to include Click. Another example is Click on a Tweet anywhere (other than the fav, + * reply, etc. buttons), which leads to the TweetDetails page. Avoid generic action names like Click if + * there is a more specific UI aspect to reference and Click is implied, e.g. ClientTweetReport is + * preferred over ClientTweetClickReport and ClientTweetReportClick. + * + * c) Rely on versioning found in the origin when it is present for action names. For example, + * a "V2Impression" is named as such because in behavioral client events, there is + * a "v2Impress" field. See go/bce-v2impress for more details. + * + * d) There is a distinction between "UndoAction" and "Un{Action}" action types. + * An "UndoAction" is fired when a user clicks on the explicit "Undo" button, after they perform an action + * This "Undo" button is a UI element that may be temporary, e.g., + * - the user waited too long to click the button, the button disappears from the UI (e.g., Undo for Mute, Block) + * - the button does not disappear due to timeout, but becomes unavailable after the user closes a tab + * (e.g, Undo for NotInterestedIn, NotAboutTopic) + * Examples: + - ClientProfileUndoMute: a user clicks the "Undo" button after muting a Profile + - ClientTweetUndoNotInterestedIn: a users clicks the "Undo" button + after clicking "Not interested in this Tweet" button in the caret menu of a Tweet + * An "Un{Action}" is fired when a user reverses a previous action, not by explicitly clicking an "Undo" button, + * but through some other action that allows them to revert. + * Examples: + * - ClientProfileUnmute: a user clicks the "Unmute" button from the caret menu of the Profile they previously muted + * - ClientTweetUnfav: a user unlikes a tweet by clicking on like button again + * + * Examples: ServerTweetFav, ClientTweetRenderImpression, ClientNotificationSeeLessOften + * + * See go/uua-action-type for more details. + */ +enum ActionType { + // 0 - 999 used for actions derived from Server-side sources (e.g. Timelineservice, Tweetypie) + // NOTE: Please match values for corresponding server / client enum members (with offset 1000). + ServerTweetFav = 0 + ServerTweetUnfav = 1 + // Reserve 2 and 3 for ServerTweetLingerImpression and ServerTweetRenderImpression + + ServerTweetCreate = 4 + ServerTweetReply = 5 + ServerTweetQuote = 6 + ServerTweetRetweet = 7 + // skip 8-10 since there are no server equivalents for ClickCreate, ClickReply, ClickQuote + // reserve 11-16 for server video engagements + + ServerTweetDelete = 17 // User deletes a default tweet + ServerTweetUnreply = 18 // User deletes a reply tweet + ServerTweetUnquote = 19 // User deletes a quote tweet + ServerTweetUnretweet = 20 // User removes an existing retweet + // User edits a tweet. Edit will create a new tweet with editedTweetId = id of the original tweet + // The original tweet or the new tweet from edit can only be a default or quote tweet. + // A user can edit a default tweet to become a quote tweet (by adding the link to another Tweet), + // or edit a quote tweet to remove the quote and make it a default tweet. + // Both the initial tweet and the new tweet created from the edit can be edited, and each time the + // new edit will create a new tweet. All subsequent edits would have the same initial tweet id + // as the TweetInfo.editedTweetId. + // e.g. create Tweet A, edit Tweet A -> Tweet B, edit Tweet B -> Tweet C + // initial tweet id for both Tweet B anc Tweet C would be Tweet A + ServerTweetEdit = 21 + // skip 22 for delete an edit if we want to add it in the future + + // reserve 30-40 for server topic actions + + // 41-70 reserved for all negative engagements and the related positive engagements + // For example, Follow and Unfollow, Mute and Unmute + // This is fired when a user click "Submit" at the end of a "Report Tweet" flow + // ClientTweetReport = 1041 is scribed by HealthClient team, on the client side + // This is scribed by spamacaw, on the server side + // They can be joined on reportFlowId + // See https://confluence.twitter.biz/pages/viewpage.action?spaceKey=HEALTH&title=Understanding+ReportDetails + ServerTweetReport = 41 + + // reserve 42 for ServerTweetNotInterestedIn + // reserve 43 for ServerTweetUndoNotInterestedIn + // reserve 44 for ServerTweetNotAboutTopic + // reserve 45 for ServerTweetUndoNotAboutTopic + + ServerProfileFollow = 50 // User follows a Profile + ServerProfileUnfollow = 51 // User unfollows a Profile + ServerProfileBlock = 52 // User blocks a Profile + ServerProfileUnblock = 53 // User unblocks a Profile + ServerProfileMute = 54 // User mutes a Profile + ServerProfileUnmute = 55 // User unmutes a Profile + // User reports a Profile as Spam / Abuse + // This user action type includes ProfileReportAsSpam and ProfileReportAsAbuse + ServerProfileReport = 56 + // reserve 57 for ServerProfileUnReport + // reserve 56-70 for server social graph actions + + // 71-90 reserved for click-based events + // reserve 71 for ServerTweetClick + + // 1000 - 1999 used for actions derived from Client-side sources (e.g. Client Events, BCE) + // NOTE: Please match values for corresponding server / client enum members (with offset 1000). + // 1000 - 1499 used for legacy client events + ClientTweetFav = 1000 + ClientTweetUnfav = 1001 + ClientTweetLingerImpression = 1002 + // Please note that: Render impression for quoted Tweets would emit 2 events: + // 1 for the quoting Tweet and 1 for the original Tweet!!! + ClientTweetRenderImpression = 1003 + // 1004 reserved for ClientTweetCreate + // This is "Send Reply" event to indicate publishing of a reply Tweet as opposed to clicking + // on the reply button to initiate a reply Tweet (captured in ClientTweetClickReply). + // The differences between this and the ServerTweetReply are: + // 1) ServerTweetReply already has the new Tweet Id 2) A sent reply may be lost during transfer + // over the wire and thus may not end up with a follow-up ServerTweetReply. + ClientTweetReply = 1005 + // This is the "send quote" event to indicate publishing of a quote tweet as opposed to clicking + // on the quote button to initiate a quote tweet (captured in ClientTweetClickQuote). + // The differences between this and the ServerTweetQuote are: + // 1) ServerTweetQuote already has the new Tweet Id 2) A sent quote may be lost during transfer + // over the wire and thus may not end up with a follow-up ServerTweetQuote. + ClientTweetQuote = 1006 + // This is the "retweet" event to indicate publishing of a retweet. + ClientTweetRetweet = 1007 + // 1008 reserved for ClientTweetClickCreate + // This is user clicking on the Reply button not actually sending a reply Tweet, + // thus the name ClickReply + ClientTweetClickReply = 1009 + // This is user clicking the Quote/RetweetWithComment button not actually sending the quote, + // thus the name ClickQuote + ClientTweetClickQuote = 1010 + + // 1011 - 1016: Refer to go/cme-scribing and go/interaction-event-spec for details + // This is fired when playback reaches 25% of total track duration. Not valid for live videos. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlayback25 = 1011 + // This is fired when playback reaches 50% of total track duration. Not valid for live videos. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlayback50 = 1012 + // This is fired when playback reaches 75% of total track duration. Not valid for live videos. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlayback75 = 1013 + // This is fired when playback reaches 95% of total track duration. Not valid for live videos. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlayback95 = 1014 + // This if fired when the video has been played in non-preview + // (i.e. not autoplaying in the timeline) mode, and was not started via auto-advance. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlayFromTap = 1015 + // This is fired when 50% of the video has been on-screen and playing for 10 consecutive seconds + // or 95% of the video duration, whichever comes first. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoQualityView = 1016 + // Fired when either view_threshold or play_from_tap is fired. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoView = 1109 + // Fired when 50% of the video has been on-screen and playing for 2 consecutive seconds, + // regardless of video duration. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoMrcView = 1110 + // Fired when the video is: + // - Playing for 3 cumulative (not necessarily consecutive) seconds with 100% in view for looping video. + // - Playing for 3 cumulative (not necessarily consecutive) seconds or the video duration, whichever comes first, with 100% in view for non-looping video. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoViewThreshold = 1111 + // Fired when the user clicks a generic ‘visit url’ call to action. + ClientTweetVideoCtaUrlClick = 1112 + // Fired when the user clicks a ‘watch now’ call to action. + ClientTweetVideoCtaWatchClick = 1113 + + // 1017 reserved for ClientTweetDelete + // 1018-1019 for Client delete a reply and delete a quote if we want to add them in the future + + // This is fired when a user clicks on "Undo retweet" after re-tweeting a tweet + ClientTweetUnretweet = 1020 + // 1021 reserved for ClientTweetEdit + // 1022 reserved for Client delete an edit if we want to add it in the future + // This is fired when a user clicks on a photo within a tweet and the photo expands to fit + // the screen. + ClientTweetPhotoExpand = 1023 + + // This is fired when a user clicks on a profile mention inside a tweet. + ClientTweetClickMentionScreenName = 1024 + + // 1030 - 1035 for topic actions + // There are multiple cases: + // 1. Follow from the Topic page (or so-called landing page) + // 2. Click on Tweet's caret menu of "Follow (the topic)", it needs to be: + // 1) user follows the Topic already (otherwise there is no "Follow" menu by default), + // 2) and clicked on the "Unfollow Topic" first. + ClientTopicFollow = 1030 + // There are multiple cases: + // 1. Unfollow from the Topic page (or so-called landing page) + // 2. Click on Tweet's caret menu of "Unfollow (the topic)" if the user has already followed + // the topic. + ClientTopicUnfollow = 1031 + // This is fired when the user clicks the "x" icon next to the topic on their timeline, + // and clicks "Not interested in {TOPIC}" in the pop-up prompt + // Alternatively, they can also click "See more" button to visit the topic page, and click "Not interested" there. + ClientTopicNotInterestedIn = 1032 + // This is fired when the user clicks the "Undo" button after clicking "x" or "Not interested" on a Topic + // which is captured in ClientTopicNotInterestedIn + ClientTopicUndoNotInterestedIn = 1033 + + // 1036-1070 reserved for all negative engagements and the related positive engagements + // For example, Follow and Unfollow, Mute and Unmute + + // This is fired when a user clicks on "This Tweet's not helpful" flow in the caret menu + // of a Tweet result on the Search Results Page + ClientTweetNotHelpful = 1036 + // This is fired when a user clicks Undo after clicking on + // "This Tweet's not helpful" flow in the caret menu of a Tweet result on the Search Results Page + ClientTweetUndoNotHelpful = 1037 + // This is fired when a user starts and/or completes the "Report Tweet" flow in the caret menu of a Tweet + ClientTweetReport = 1041 + /* + * 1042-1045 refers to actions that are related to the + * "Not Interested In" button in the caret menu of a Tweet. + * + * ClientTweetNotInterestedIn is fired when a user clicks the + * "Not interested in this Tweet" button in the caret menu of a Tweet. + * A user can undo the ClientTweetNotInterestedIn action by clicking the + * "Undo" button that appears as a prompt in the caret menu, resulting + * in ClientTweetUndoNotInterestedIn being fired. + * If a user chooses to not undo and proceed, they are given multiple choices + * in a prompt to better document why they are not interested in a Tweet. + * For example, if a Tweet is not about a Topic, a user can click + * "This Tweet is not about {TOPIC}" in the provided prompt, resulting in + * in ClientTweetNotAboutTopic being fired. + * A user can undo the ClientTweetNotAboutTopic action by clicking the "Undo" + * button that appears as a subsequent prompt in the caret menu. Undoing this action + * results in the previous UI state, where the user had only marked "Not Interested In" and + * can still undo the original ClientTweetNotInterestedIn action. + * Similarly a user can select "This Tweet isn't recent" action resulting in ClientTweetNotRecent + * and he could undo this action immediately which results in ClientTweetUndoNotRecent + * Similarly a user can select "Show fewer tweets from" action resulting in ClientTweetSeeFewer + * and he could undo this action immediately which results in ClientTweetUndoSeeFewer + */ + ClientTweetNotInterestedIn = 1042 + ClientTweetUndoNotInterestedIn = 1043 + ClientTweetNotAboutTopic = 1044 + ClientTweetUndoNotAboutTopic = 1045 + ClientTweetNotRecent = 1046 + ClientTweetUndoNotRecent = 1047 + ClientTweetSeeFewer = 1048 + ClientTweetUndoSeeFewer = 1049 + + // This is fired when a user follows a profile from the + // profile page header / people module and people tab on the Search Results Page / sidebar on the Home page + // A Profile can also be followed when a user clicks follow in the caret menu of a Tweet + // or follow button on hovering on profile avatar, which is captured in ClientTweetFollowAuthor = 1060 + ClientProfileFollow = 1050 + // reserve 1050/1051 for client side Follow/Unfollow + // This is fired when a user clicks Block in a Profile page + // A Profile can also be blocked when a user clicks Block in the caret menu of a Tweet, + // which is captured in ClientTweetBlockAuthor = 1062 + ClientProfileBlock = 1052 + // This is fired when a user clicks unblock in a pop-up prompt right after blocking a profile + // in the profile page or clicks unblock in a drop-down menu in the profile page. + ClientProfileUnblock = 1053 + // This is fired when a user clicks Mute in a Profile page + // A Profile can also be muted when a user clicks Mute in the caret menu of a Tweet, which is captured in ClientTweetMuteAuthor = 1064 + ClientProfileMute = 1054 + // reserve 1055 for client side Unmute + // This is fired when a user clicks "Report User" action from user profile page + ClientProfileReport = 1056 + + // reserve 1057 for ClientProfileUnreport + + // This is fired when a user clicks on a profile from all modules except tweets + // (eg: People Search / people module in Top tab in Search Result Page + // For tweets, the click is captured in ClientTweetClickProfile + ClientProfileClick = 1058 + // reserve 1059-1070 for client social graph actions + + // This is fired when a user clicks Follow in the caret menu of a Tweet or hovers on the avatar of the tweet + // author and clicks on the Follow button. A profile can also be followed by clicking the Follow button on the + // Profile page and confirm, which is captured in ClientProfileFollow. The event emits two items, one of user type + // and another of tweet type, since the default implementation of BaseClientEvent only looks for Tweet type, + // the other item is dropped which is the expected behaviour + ClientTweetFollowAuthor = 1060 + + // This is fired when a user clicks Unfollow in the caret menu of a Tweet or hovers on the avatar of the tweet + // author and clicks on the Unfollow button. A profile can also be unfollowed by clicking the Unfollow button on the + // Profile page and confirm, which will be captured in ClientProfileUnfollow. The event emits two items, one of user type + // and another of tweet type, since the default implementation of BaseClientEvent only looks for Tweet type, + // the other item is dropped which is the expected behaviour + ClientTweetUnfollowAuthor = 1061 + + // This is fired when a user clicks Block in the menu of a Tweet to block the Profile that + // authored this Tweet. A Profile can also be blocked in the Profile page, which is captured + // in ClientProfileBlock = 1052 + ClientTweetBlockAuthor = 1062 + // This is fired when a user clicks unblock in a pop-up prompt right after blocking an author + // in the drop-down menu of a tweet + ClientTweetUnblockAuthor = 1063 + + // This is fired when a user clicks Mute in the menu of a Tweet to block the Profile that + // authored this Tweet. A Profile can also be muted in the Profile page, which is captured in ClientProfileMute = 1054 + ClientTweetMuteAuthor = 1064 + + // reserve 1065 for ClientTweetUnmuteAuthor + + // 1071-1090 reserved for click-based events + // click-based events are defined as clicks on a UI container (e.g., tweet, profile, etc.), as opposed to clearly named + // button or menu (e.g., follow, block, report, etc.), which requires a specific action name than "click". + + // This is fired when a user clicks on a Tweet to open the Tweet details page. Note that for + // Tweets in the Notification Tab product surface, a click can be registered differently + // depending on whether the Tweet is a rendered Tweet (a click results in ClientTweetClick) + // or a wrapper Notification (a click results in ClientNotificationClick). + ClientTweetClick = 1071 + // This is fired when a user clicks to view the profile page of a user from a tweet + // Contains a TweetInfo of this tweet + ClientTweetClickProfile = 1072 + // This is fired when a user clicks on the "share" icon on a Tweet to open the share menu. + // The user may or may not proceed and finish sharing the Tweet. + ClientTweetClickShare = 1073 + // This is fired when a user clicks "Copy link to Tweet" in a menu appeared after hitting + // the "share" icon on a Tweet OR when a user selects share_via -> copy_link after long-click + // a link inside a tweet on a mobile device + ClientTweetShareViaCopyLink = 1074 + // This is fired when a user clicks "Send via Direct Message" after + // clicking on the "share" icon on a Tweet to open the share menu. + // The user may or may not proceed and finish Sending the DM. + ClientTweetClickSendViaDirectMessage = 1075 + // This is fired when a user clicks "Bookmark" after + // clicking on the "share" icon on a Tweet to open the share menu. + ClientTweetShareViaBookmark = 1076 + // This is fired when a user clicks "Remove Tweet from Bookmarks" after + // clicking on the "share" icon on a Tweet to open the share menu. + ClientTweetUnbookmark = 1077 + // This is fired when a user clicks on the hashtag in a Tweet. + // The click on hashtag in "What's happening" section gives you other scribe '*:*:sidebar:*:trend:search' + // Currenly we are only filtering for itemType=Tweet. There are other items present in the event where itemType = user + // but those items are in dual-events (events with multiple itemTypes) and happen when you click on a hashtag in a Tweet from someone's profile, + // hence we are ignoring those itemType and only keeping itemType=Tweet. + ClientTweetClickHashtag = 1078 + // This is fired when a user clicks "Bookmark" after clicking on the "share" icon on a Tweet to open the share menu, or + // when a user clicks on the 'bookmark' icon on a Tweet (bookmark icon is available to ios only as of March 2023). + // TweetBookmark and TweetShareByBookmark log the same events but serve for individual use cases. + ClientTweetBookmark = 1079 + + // 1078 - 1089 for all Share related actions. + + // This is fired when a user clicks on a link in a tweet. + // The link could be displayed as a URL or embedded in a component such as an image or a card in a tweet. + ClientTweetOpenLink = 1090 + // This is fired when a user takes screenshot. + // This is available for mobile clients only. + ClientTweetTakeScreenshot = 1091 + + // 1100 - 1101: Refer to go/cme-scribing and go/interaction-event-spec for details + // Fired on the first tick of a track regardless of where in the video it is playing. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlaybackStart = 1100 + // Fired when playback reaches 100% of total track duration. + // Not valid for live videos. + // For looping playback, this is only fired once and does not reset at loop boundaries. + ClientTweetVideoPlaybackComplete = 1101 + + // A user can select "This Tweet isn't relevant" action resulting in ClientTweetNotRelevant + // and they could undo this action immediately which results in ClientTweetUndoNotRelevant + ClientTweetNotRelevant = 1102 + ClientTweetUndoNotRelevant = 1103 + + // A generic action type to submit feedback for different modules / items ( Tweets / Search Results ) + ClientFeedbackPromptSubmit = 1104 + + // This is fired when a user profile is open in a Profile page + ClientProfileShow = 1105 + + /* + * This is triggered when a user exits the Twitter platform. The amount of the time spent on the + * platform is recorded in ms that can be used to compute the User Active Seconds (UAS). + */ + ClientAppExit = 1106 + + /* + * For "card" related actions + */ + ClientCardClick = 1107 + ClientCardOpenApp = 1108 + ClientCardAppInstallAttempt = 1114 + ClientPollCardVote = 1115 + + /* + * The impressions 1121-1123 together with the ClientTweetRenderImpression 1003 are used by ViewCount + * and UnifiedEngagementCounts as EngagementType.Displayed and EngagementType.Details. + * + * For definitions, please refer to https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/common-internal/analytics/client-event-util/src/main/java/com/twitter/common_internal/analytics/client_event_util/TweetImpressionUtils.java?L14&subtree=true + */ + ClientTweetGalleryImpression = 1121 + ClientTweetDetailsImpression = 1122 + + /** + * This is fired when a user is logged out and follows a profile from the + * profile page / people module from web. + * One can only try to follow from web because iOS and Android do not support logged out browsing as of Jan 2023. + */ + ClientProfileFollowAttempt = 1200 + + /** + * This is fired when a user is logged out and favourite a tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + ClientTweetFavoriteAttempt = 1201 + + /** + * This is fired when a user is logged out and Retweet a tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + ClientTweetRetweetAttempt = 1202 + + /** + * This is fired when a user is logged out and reply on tweet from web. + * One can only try to favourite from web, iOS and Android do not support logged out browsing + */ + ClientTweetReplyAttempt = 1203 + + /** + * This is fired when a user is logged out and clicks on login button. + * Currently seem to be generated only on [m5, LiteNativeWrapper] + */ + ClientCTALoginClick = 1204 + /** + * This is fired when a user is logged out and login window is shown. + */ + ClientCTALoginStart = 1205 + /** + * This is fired when a user is logged out and login is successful. + */ + ClientCTALoginSuccess = 1206 + + /** + * This is fired when a user is logged out and clicks on signup button. + */ + ClientCTASignupClick = 1207 + + /** + * This is fired when a user is logged out and signup is successful. + */ + ClientCTASignupSuccess = 1208 + // 1400 - 1499 for product surface specific actions + // This is fired when a user opens a Push Notification + ClientNotificationOpen = 1400 + // This is fired when a user clicks on a Notification in the Notification Tab + ClientNotificationClick = 1401 + // This is fired when a user taps the "See Less Often" caret menu item of a Notification in the Notification Tab + ClientNotificationSeeLessOften = 1402 + // This is fired when a user closes or swipes away a Push Notification + ClientNotificationDismiss = 1403 + + // 1420 - 1439 is reserved for Search Results Page related actions + // 1440 - 1449 is reserved for Typeahead related actions + + // This is fired when a user clicks on a typeahead suggestion(queries, events, topics, users) + // in a drop-down menu of a search box or a tweet compose box. + ClientTypeaheadClick = 1440 + + // 1500 - 1999 used for behavioral client events + // Tweet related impressions + ClientTweetV2Impression = 1500 + /* Fullscreen impressions + * + * Android client will always log fullscreen_video impressions, regardless of the media type + * i.e. video, image, MM will all be logged as fullscreen_video + * + * iOS clients will log fullscreen_video or fullscreen_image depending on the media type + * on display when the user exits fullscreen. i.e. + * - image tweet => fullscreen_image + * - video tweet => fullscreen_video + * - MM tweet => fullscreen_video if user exits fullscreen from the video + * => fullscreen_image if user exits fullscreen from the image + * + * Web clients will always log fullscreen_image impressions, regardless of the media type + * + * References + * https://docs.google.com/document/d/1oEt9_Gtz34cmO_JWNag5YKKEq4Q7cJFL-nbHOmhnq1Y + * https://docs.google.com/document/d/1V_7TbfPvTQgtE_91r5SubD7n78JsVR_iToW59gOMrfQ + */ + ClientTweetVideoFullscreenV2Impression = 1501 + ClientTweetImageFullscreenV2Impression = 1502 + // Profile related impressions + ClientProfileV2Impression = 1600 + /* + * Email Notifications: These are actions taken by the user in response to Your Highlights email + * ClientTweetEmailClick refers to the action NotificationType.Click + */ + ClientTweetEmailClick = 5001 + + /* + * User create via Gizmoduck + */ + ServerUserCreate = 6000 + ServerUserUpdate = 6001 + /* + * Ads callback engagements + */ + /* + * This engagement is generated when a user Favs a promoted Tweet. + */ + ServerPromotedTweetFav = 7000 + /* + * This engagement is generated when a user Unfavs a promoted Tweet that they previously Faved. + */ + ServerPromotedTweetUnfav = 7001 + ServerPromotedTweetReply = 7002 + ServerPromotedTweetRetweet = 7004 + /* + * The block could be performed from the promoted tweet or on the promoted tweet's author's profile + * ads_spend_event data shows majority (~97%) of blocks have an associated promoted tweet id + * So for now we assume the blocks are largely performed from the tweet and following naming convention of ClientTweetBlockAuthor + */ + ServerPromotedTweetBlockAuthor = 7006 + ServerPromotedTweetUnblockAuthor = 7007 + /* + * This is when a user clicks on the Conversational Card in the Promoted Tweet which + * leads to the Tweet Compose page. The user may or may not send the new Tweet. + */ + ServerPromotedTweetComposeTweet = 7008 + /* + * This is when a user clicks on the Promoted Tweet to view its details/replies. + */ + ServerPromotedTweetClick = 7009 + /* + * The video ads engagements are divided into two sets: VIDEO_CONTENT_* and VIDEO_AD_*. These engagements + * have similar definitions. VIDEO_CONTENT_* engagements are fired for videos that are part of + * a Tweet. VIDEO_AD_* engagements are fired for a preroll ad. A preroll ad can play on a promoted + * Tweet or on an organic Tweet. go/preroll-matching for more information. + * + * 7011-7013: A Promoted Event is fired when playback reaches 25%, 50%, 75% of total track duration. + * This is for the video on a promoted Tweet. + * Not valid for live videos. Refer go/avscribing. + * For a video that has a preroll ad played before it, the metadata will contain information about + * the preroll ad as well as the video itself. There will be no preroll metadata if there was no + * preroll ad played. + */ + ServerPromotedTweetVideoPlayback25 = 7011 + ServerPromotedTweetVideoPlayback50 = 7012 + ServerPromotedTweetVideoPlayback75 = 7013 + /* + * This is when a user successfully completes the Report flow on a Promoted Tweet. + * It covers reports for all policies from Client Event. + */ + ServerPromotedTweetReport = 7041 + /* + * Follow from Ads data stream, it could be from both Tweet or other places + */ + ServerPromotedProfileFollow = 7060 + /* + * Follow from Ads data stream, it could be from both Tweet or other places + */ + ServerPromotedProfileUnfollow = 7061 + /* + * This is when a user clicks on the mute promoted tweet's author option from the menu. + */ + ServerPromotedTweetMuteAuthor = 7064 + /* + * This is fired when a user clicks on the profile image, screen name, or the user name of the + * author of the Promoted Tweet which leads to the author's profile page. + */ + ServerPromotedTweetClickProfile = 7072 + /* + * This is fired when a user clicks on a hashtag in the Promoted Tweet. + */ + ServerPromotedTweetClickHashtag = 7078 + /* + * This is fired when a user opens link by clicking on a URL in the Promoted Tweet. + */ + ServerPromotedTweetOpenLink = 7079 + /* + * This is fired when a user swipes to the next element of the carousel in the Promoted Tweet. + */ + ServerPromotedTweetCarouselSwipeNext = 7091 + /* + * This is fired when a user swipes to the previous element of the carousel in the Promoted Tweet. + */ + ServerPromotedTweetCarouselSwipePrevious = 7092 + /* + * This event is only for the Promoted Tweets with a web URL. + * It is fired after exiting a WebView from a Promoted Tweet if the user was on the WebView for + * at least 1 second. + * + * See https://confluence.twitter.biz/display/REVENUE/dwell_short for more details. + */ + ServerPromotedTweetLingerImpressionShort = 7093 + /* + * This event is only for the Promoted Tweets with a web URL. + * It is fired after exiting a WebView from a Promoted Tweet if the user was on the WebView for + * at least 2 seconds. + * + * See https://confluence.twitter.biz/display/REVENUE/dwell_medium for more details. + */ + ServerPromotedTweetLingerImpressionMedium = 7094 + /* + * This event is only for the Promoted Tweets with a web URL. + * It is fired after exiting a WebView from a Promoted Tweet if the user was on the WebView for + * at least 10 seconds. + * + * See https://confluence.twitter.biz/display/REVENUE/dwell_long for more details. + */ + ServerPromotedTweetLingerImpressionLong = 7095 + /* + * This is fired when a user navigates to explorer page (taps search magnifying glass on Home page) + * and a Promoted Trend is present and taps ON the promoted spotlight - a video/gif/image in the + * "hero" position (top of the explorer page). + */ + ServerPromotedTweetClickSpotlight = 7096 + /* + * This is fired when a user navigates to explorer page (taps search magnifying glass on Home page) + * and a Promoted Trend is present. + */ + ServerPromotedTweetViewSpotlight = 7097 + /* + * 7098-7099: Promoted Trends appear in the first or second slots of the “Trends for you” section + * in the Explore tab and “What’s Happening” module on Twitter.com. For more information, check go/ads-takeover. + * 7099: This is fired when a user views a promoted Trend. It should be considered as an impression. + */ + ServerPromotedTrendView = 7098 + /* + * 7099: This is fired when a user clicks a promoted Trend. It should be considered as an engagment. + */ + ServerPromotedTrendClick = 7099 + /* + * 7131-7133: A Promoted Event fired when playback reaches 25%, 50%, 75% of total track duration. + * This is for the preroll ad that plays before a video on a promoted Tweet. + * Not valid for live videos. Refer go/avscribing. + * This will only contain metadata for the preroll ad. + */ + ServerPromotedTweetVideoAdPlayback25 = 7131 + ServerPromotedTweetVideoAdPlayback50 = 7132 + ServerPromotedTweetVideoAdPlayback75 = 7133 + /* + * 7151-7153: A Promoted Event fired when playback reaches 25%, 50%, 75% of total track duration. + * This is for the preroll ad that plays before a video on an organic Tweet. + * Not valid for live videos. Refer go/avscribing. + * This will only contain metadata for the preroll ad. + */ + ServerTweetVideoAdPlayback25 = 7151 + ServerTweetVideoAdPlayback50 = 7152 + ServerTweetVideoAdPlayback75 = 7153 + + ServerPromotedTweetDismissWithoutReason = 7180 + ServerPromotedTweetDismissUninteresting = 7181 + ServerPromotedTweetDismissRepetitive = 7182 + ServerPromotedTweetDismissSpam = 7183 + + + /* + * For FavoriteArchival Events + */ + ServerTweetArchiveFavorite = 8000 + ServerTweetUnarchiveFavorite = 8001 + /* + * For RetweetArchival Events + */ + ServerTweetArchiveRetweet = 8002 + ServerTweetUnarchiveRetweet = 8003 +}(persisted='true', hasPersonalData='false') + +/* + * This union will be updated when we have a particular + * action that has attributes unique to that particular action + * (e.g. linger impressions have start/end times) and not common + * to all tweet actions. + * Naming convention for TweetActionInfo should be consistent with + * ActionType. For example, `ClientTweetLingerImpression` ActionType enum + * should correspond to `ClientTweetLingerImpression` TweetActionInfo union arm. + * We typically preserve 1:1 mapping between ActionType and TweetActionInfo. However, we make + * exceptions when optimizing for customer requirements. For example, multiple 'ClientTweetVideo*' + * ActionType enums correspond to a single `TweetVideoWatch` TweetActionInfo union arm because + * customers want individual action labels but common information across those labels. + */ +union TweetActionInfo { + // 41 matches enum index ServerTweetReport in ActionType + 41: ServerTweetReport serverTweetReport + // 1002 matches enum index ClientTweetLingerImpression in ActionType + 1002: ClientTweetLingerImpression clientTweetLingerImpression + // Common metadata for + // 1. "ClientTweetVideo*" ActionTypes with enum indices 1011-1016 and 1100-1101 + // 2. "ServerPromotedTweetVideo*" ActionTypes with enum indices 7011-7013 and 7131-7133 + // 3. "ServerTweetVideo*" ActionTypes with enum indices 7151-7153 + // This is because: + // 1. all the above listed ActionTypes share common metadata + // 2. more modular code as the same struct can be reused + // 3. reduces chance of error while populating and parsing the metadata + // 4. consumers can easily process the metadata + 1011: TweetVideoWatch tweetVideoWatch + // 1012: skip + // 1013: skip + // 1014: skip + // 1015: skip + // 1016: skip + // 1024 matches enum index ClientTweetClickMentionScreenName in ActionType + 1024: ClientTweetClickMentionScreenName clientTweetClickMentionScreenName + // 1041 matches enum index ClientTweetReport in ActionType + 1041: ClientTweetReport clientTweetReport + // 1060 matches enum index ClientTweetFollowAuthor in ActionType + 1060: ClientTweetFollowAuthor clientTweetFollowAuthor + // 1061 matches enum index ClientTweetUnfollowAuthor in ActionType + 1061: ClientTweetUnfollowAuthor clientTweetUnfollowAuthor + // 1078 matches enum index ClientTweetClickHashtag in ActionType + 1078: ClientTweetClickHashtag clientTweetClickHashtag + // 1090 matches enum index ClientTweetOpenLink in ActionType + 1090: ClientTweetOpenLink clientTweetOpenLink + // 1091 matches enum index ClientTweetTakeScreenshot in ActionType + 1091: ClientTweetTakeScreenshot clientTweetTakeScreenshot + // 1500 matches enum index ClientTweetV2Impression in ActionType + 1500: ClientTweetV2Impression clientTweetV2Impression + // 7079 matches enum index ServerPromotedTweetOpenLink in ActionType + 7079: ServerPromotedTweetOpenLink serverPromotedTweetOpenLink +}(persisted='true', hasPersonalData='true') + + +struct ClientTweetOpenLink { + //Url which was clicked. + 1: optional string url(personalDataType = 'RawUrlPath') +}(persisted='true', hasPersonalData='true') + +struct ServerPromotedTweetOpenLink { + //Url which was clicked. + 1: optional string url(personalDataType = 'RawUrlPath') +}(persisted='true', hasPersonalData='true') + +struct ClientTweetClickHashtag { + /* Hashtag string which was clicked. The PDP annotation is SearchQuery, + * because clicking on the hashtag triggers a search with the hashtag + */ + 1: optional string hashtag(personalDataType = 'SearchQuery') +}(persisted='true', hasPersonalData='true') + +struct ClientTweetTakeScreenshot { + //percentage visible height. + 1: optional i32 percentVisibleHeight100k +}(persisted='true', hasPersonalData='false') + +/* + * See go/ioslingerimpressionbehaviors and go/lingerandroidfaq + * for ios and android client definitions of a linger respectively. + */ +struct ClientTweetLingerImpression { + /* Milliseconds since epoch when the tweet became more than 50% visible. */ + 1: required i64 lingerStartTimestampMs(personalDataType = 'ImpressionMetadata') + /* Milliseconds since epoch when the tweet became less than 50% visible. */ + 2: required i64 lingerEndTimestampMs(personalDataType = 'ImpressionMetadata') +}(persisted='true', hasPersonalData='true') + +/* + * See go/behavioral-client-events for general behavioral client event (BCE) information + * and go/bce-v2impress for detailed information about BCE impression events. + * + * Unlike ClientTweetLingerImpression, there is no lower bound on the amount of time + * necessary for the impress event to occur. There is also no visibility requirement for a impress + * event to occur. + */ +struct ClientTweetV2Impression { + /* Milliseconds since epoch when the tweet became visible. */ + 1: required i64 impressStartTimestampMs(personalDataType = 'ImpressionMetadata') + /* Milliseconds since epoch when the tweet became visible. */ + 2: required i64 impressEndTimestampMs(personalDataType = 'ImpressionMetadata') + /* + * The UI component that hosted this tweet where the impress event happened. + * + * For example, sourceComponent = "tweet" if the impress event happened on a tweet displayed amongst + * a collection of tweets, or sourceComponent = "tweet_details" if the impress event happened on + * a tweet detail UI component. + */ + 3: required string sourceComponent(personalDataType = 'WebsitePage') +}(persisted='true', hasPersonalData='true') + + /* + * Refer to go/cme-scribing and go/interaction-event-spec for details + */ +struct TweetVideoWatch { + /* + * Type of video included in the Tweet + */ + 1: optional client_app.MediaType mediaType(personalDataType = 'MediaFile') + /* + * Whether the video content is "monetizable", i.e., + * if a preroll ad may be served dynamically when the video plays + */ + 2: optional bool isMonetizable(personalDataType = 'MediaFile') + + /* + * The owner of the video, provided by playlist. + * + * For ad engagements related to a preroll ad (VIDEO_AD_*), + * this will be the owner of the preroll ad and same as the prerollOwnerId. + * + * For ad engagements related to a regular video (VIDEO_CONTENT_*), this will be the owner of the + * video and not the preroll ad. + */ + 3: optional i64 videoOwnerId(personalDataType = 'UserId') + + /* + * Identifies the video associated with a card. + * + * For ad Engagements, in the case of engagements related to a preroll ad (VIDEO_AD_*), + * this will be the id of the preroll ad and same as the prerollUuid. + * + * For ad engagements related to a regular video (VIDEO_CONTENT_*), this will be id of the video + * and not the preroll ad. + */ + 4: optional string videoUuid(personalDataType = 'MediaId') + + /* + * Id of the preroll ad shown before the video + */ + 5: optional string prerollUuid(personalDataType = 'MediaId') + + /* + * Advertiser id of the preroll ad + */ + 6: optional i64 prerollOwnerId(personalDataType = 'UserId') + /* + * for amplify_flayer events, indicates whether preroll or the main video is being played + */ + 7: optional string videoType(personalDataType = 'MediaFile') +}(persisted='true', hasPersonalData='true') + +struct ClientTweetClickMentionScreenName { + /* Id for the profile (user_id) that was actioned on */ + 1: required i64 actionProfileId(personalDataType = 'UserId') + /* The handle/screenName of the user. This can't be changed. */ + 2: required string handle(personalDataType = 'UserName') +}(persisted='true', hasPersonalData='true') + +struct ClientTweetReport { + /* + * Whether the "Report Tweet" flow was successfully completed. + * `true` if the flow was completed successfully, `false` otherwise. + */ + 1: required bool isReportTweetDone + /* + * report-flow-id is included in Client Event when the "Report Tweet" flow was initiated + * See go/report-flow-ids and + * https://confluence.twitter.biz/pages/viewpage.action?spaceKey=HEALTH&title=Understanding+ReportDetails + */ + 2: optional string reportFlowId +}(persisted='true', hasPersonalData='true') + +enum TweetAuthorFollowClickSource { + UNKNOWN = 1 + CARET_MENU = 2 + PROFILE_IMAGE = 3 +} + +struct ClientTweetFollowAuthor { + /* + * Where did the user click the Follow button on the tweet - from the caret menu("CARET_MENU") + * or via hovering over the profile and clicking on Follow ("PROFILE_IMAGE") - only applicable for web clients + * "UNKNOWN" if the scribe do not match the expected namespace for the above + */ + 1: required TweetAuthorFollowClickSource followClickSource +}(persisted='true', hasPersonalData='false') + +enum TweetAuthorUnfollowClickSource { + UNKNOWN = 1 + CARET_MENU = 2 + PROFILE_IMAGE = 3 +} + +struct ClientTweetUnfollowAuthor { + /* + * Where did the user click the Unfollow button on the tweet - from the caret menu("CARET_MENU") + * or via hovering over the profile and clicking on Unfollow ("PROFILE_IMAGE") - only applicable for web clients + * "UNKNOWN" if the scribe do not match the expected namespace for the above + */ + 1: required TweetAuthorUnfollowClickSource unfollowClickSource +}(persisted='true', hasPersonalData='false') + +struct ServerTweetReport { + /* + * ReportDetails will be populated when the tweet report was scribed by spamacaw (server side) + * Only for the action submit, all the fields under ReportDetails will be available. + * This is because only after successful submission, we will know the report_type and report_flow_name. + * Reference: https://confluence.twitter.biz/pages/viewpage.action?spaceKey=HEALTH&title=Understanding+ReportDetails + */ + 1: optional string reportFlowId + 2: optional report_flow_logs.ReportType reportType +}(persisted='true', hasPersonalData='false') + +/* + * This union will be updated when we have a particular + * action that has attributes unique to that particular action + * (e.g. linger impressions have start/end times) and not common + * to other profile actions. + * + * Naming convention for ProfileActionInfo should be consistent with + * ActionType. For example, `ClientProfileV2Impression` ActionType enum + * should correspond to `ClientProfileV2Impression` ProfileActionInfo union arm. + */ +union ProfileActionInfo { + // 56 matches enum index ServerProfileReport in ActionType + 56: ServerProfileReport serverProfileReport + // 1600 matches enum index ClientProfileV2Impression in ActionType + 1600: ClientProfileV2Impression clientProfileV2Impression + // 6001 matches enum index ServerUserUpdate in ActionType + 6001: ServerUserUpdate serverUserUpdate +}(persisted='true', hasPersonalData='true') + +/* + * See go/behavioral-client-events for general behavioral client event (BCE) information + * and https://docs.google.com/document/d/16CdSRpsmUUd17yoFH9min3nLBqDVawx4DaZoiqSfCHI/edit#heading=h.3tu05p92xgxc + * for detailed information about BCE impression event. + * + * Unlike ClientTweetLingerImpression, there is no lower bound on the amount of time + * necessary for the impress event to occur. There is also no visibility requirement for a impress + * event to occur. + */ +struct ClientProfileV2Impression { + /* Milliseconds since epoch when the profile page became visible. */ + 1: required i64 impressStartTimestampMs(personalDataType = 'ImpressionMetadata') + /* Milliseconds since epoch when the profile page became visible. */ + 2: required i64 impressEndTimestampMs(personalDataType = 'ImpressionMetadata') + /* + * The UI component that hosted this profile where the impress event happened. + * + * For example, sourceComponent = "profile" if the impress event happened on a profile page + */ + 3: required string sourceComponent(personalDataType = 'WebsitePage') +}(persisted='true', hasPersonalData='true') + +struct ServerProfileReport { + 1: required social_graph_service_write_log.Action reportType(personalDataType = 'ReportType') +}(persisted='true', hasPersonalData='true') + +struct ServerUserUpdate { + 1: required list updates + 2: optional bool success (personalDataType = 'AuditMessage') +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/common.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/common.thrift new file mode 100644 index 000000000..cf9efe063 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/common.thrift @@ -0,0 +1,20 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +/* + * Uniquely identifies a user. A user identifier + * for a logged in user should contain a user id + * and a user identifier for a logged out user should + * contain some guest id. A user may have multiple ids. + */ +struct UserIdentifier { + 1: optional i64 userId(personalDataType='UserId') + /* + * See http://go/guest-id-cookie-tdd. As of Dec 2021, + * guest id is intended only for essential use cases + * (e.g. logged out preferences, security). Guest id + * marketing is intended for recommendation use cases. + */ + 2: optional i64 guestIdMarketing(personalDataType='GuestId') +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/item.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/item.thrift new file mode 100644 index 000000000..c120e587c --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/item.thrift @@ -0,0 +1,294 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +include "com/twitter/unified_user_actions/action_info.thrift" +include "com/twitter/clientapp/gen/client_app.thrift" + +/* + * Tweet item information. Some development notes: + * 1. Please keep this top-level struct as minimal as possible to reduce overhead. + * 2. We intentionally avoid nesting action tweet in a separate structure + * to underscore its importance and faciliate extraction of most commonly + * needed fields such as actionTweetId. New fields related to the action tweet + * should generally be prefixed with "actionTweet". + * 3. For the related Tweets, e.g. retweetingTweetId, inReplyToTweetId, etc, we + * mostly only keep their ids for consistency and simplicity. + */ +struct TweetInfo { + + /* Id for the tweet that was actioned on */ + 1: required i64 actionTweetId(personalDataType = 'TweetId') + // Deprecated, please don't re-use! + // 2: optional i64 actionTweetAuthorId(personalDataType = 'UserId') + /* The social proof (i.e. banner) Topic Id that the action Tweet is associated to */ + 3: optional i64 actionTweetTopicSocialProofId(personalDataType='InferredInterests, ProvidedInterests') + 4: optional AuthorInfo actionTweetAuthorInfo + + // Fields 1-99 reserved for `actionFooBar` fields + + /* Additional details for the action that took place on actionTweetId */ + 100: optional action_info.TweetActionInfo tweetActionInfo + + /* Id of the tweet retweeting the action tweet */ + 101: optional i64 retweetingTweetId(personalDataType = 'TweetId') + /* Id of the tweet quoting the action Tweet, when the action type is quote */ + 102: optional i64 quotingTweetId(personalDataType = 'TweetId') + /* Id of the tweet replying to the action Tweet, when the action type is reply */ + 103: optional i64 replyingTweetId(personalDataType = 'TweetId') + /* Id of the tweet being quoted by the action tweet */ + 104: optional i64 quotedTweetId(personalDataType = 'TweetId') + /* Id of the tweet being replied to by the action tweet */ + 105: optional i64 inReplyToTweetId(personalDataType = 'TweetId') + /* Id of the tweet being retweeted by the action tweet, this is just for Unretweet action */ + 106: optional i64 retweetedTweetId(personalDataType = 'TweetId') + /* Id of the tweet being edited, this is only available for TweetEdit action, and TweetDelete + * action when the deleted tweet was created from Edit. */ + 107: optional i64 editedTweetId(personalDataType = 'TweetId') + /* Position of a tweet item in a page such as home and tweet detail, and is populated in + * Client Event. */ + 108: optional i32 tweetPosition + /* PromotedId is provided by ads team for each promoted tweet and is logged in client event */ + 109: optional string promotedId(personalDataType = 'AdsId') + /* corresponding to inReplyToTweetId */ + 110: optional i64 inReplyToAuthorId(personalDataType = 'UserId') + /* corresponding to retweetingTweetId */ + 111: optional i64 retweetingAuthorId(personalDataType = 'UserId') + /* corresponding to quotedTweetId */ + 112: optional i64 quotedAuthorId(personalDataType = 'UserId') +}(persisted='true', hasPersonalData='true') + +/* + * Profile item information. This follows TweetInfo's development notes. + */ +struct ProfileInfo { + + /* Id for the profile (user_id) that was actioned on + * + * In a social graph user action, e.g., user1 follows/blocks/mutes user2, + * userIdentifier captures userId of user1 and actionProfileId records + * the userId of user2. + */ + 1: required i64 actionProfileId(personalDataType = 'UserId') + + // Fields 1-99 reserved for `actionFooBar` fields + /* the full name of the user. max length is 50. */ + 2: optional string name(personalDataType = 'DisplayName') + /* The handle/screenName of the user. This can't be changed. + */ + 3: optional string handle(personalDataType = 'UserName') + /* the "bio" of the user. max length is 160. May contain one or more t.co + * links, which will be hydrated in the UrlEntities substruct if the + * QueryFields.URL_ENTITIES is specified. + */ + 4: optional string description(personalDataType = 'Bio') + + /* Additional details for the action that took place on actionProfileId */ + 100: optional action_info.ProfileActionInfo profileActionInfo +}(persisted='true', hasPersonalData='true') + +/* + * Topic item information. This follows TweetInfo's development notes. + */ +struct TopicInfo { + /* Id for the Topic that was actioned on */ + 1: required i64 actionTopicId(personalDataType='InferredInterests, ProvidedInterests') + + // Fields 1-99 reserved for `actionFooBar` fields +}(persisted='true', hasPersonalData='true') + +/* + * Notification Item information. + * + * See go/phab-d973370-discuss, go/phab-d968144-discuss, and go/uua-action-type for details about + * the schema design for Notification events. + */ +struct NotificationInfo { + /* + * Id of the Notification was actioned on. + * + * Note that this field represents the `impressionId` of a Notification. It has been renamed to + * `notificationId` in UUA so that the name effectively represents the value it holds, + * i.e. a unique id for a Notification and request. + */ + 1: required string actionNotificationId(personalDataType='UniversallyUniqueIdentifierUuid') + /* + * Additional information contained in a Notification. This is a `union` arm to differentiate + * among different types of Notifications and store relevant metadata for each type. + * + * For example, a Notification with a single Tweet will hold the Tweet id in `TweetNotification`. + * Similarly, `MultiTweetNotification` is defined for Notiifcations with multiple Tweet ids. + * + * Refer to the definition of `union NotificationContent` below for more details. + */ + 2: required NotificationContent content +}(persisted='true', hasPersonalData='true') + +/* + * Additional information contained in a Notification. + */ +union NotificationContent { + 1: TweetNotification tweetNotification + 2: MultiTweetNotification multiTweetNotification + + // 3 - 100 reserved for other specific Notification types (for example, profile, event, etc.). + + /* + * If a Notification cannot be categorized into any of the types at indices 1 - 100, + * it is considered of `Unknown` type. + */ + 101: UnknownNotification unknownNotification +}(persisted='true', hasPersonalData='true') + +/* + * Notification contains exactly one `tweetId`. + */ +struct TweetNotification { + 1: required i64 tweetId(personalDataType = 'TweetId') +}(persisted='true', hasPersonalData='true') + +/* + * Notification contains multiple `tweetIds`. + * For example, user A receives a Notification when user B likes multiple Tweets authored by user A. + */ +struct MultiTweetNotification { + 1: required list tweetIds(personalDataType = 'TweetId') +}(persisted='true', hasPersonalData='true') + +/* + * Notification could not be categrized into known types at indices 1 - 100 in `NotificationContent`. + */ +struct UnknownNotification { + // this field is just a placeholder since Sparrow doesn't support empty struct + 100: optional bool placeholder +}(persisted='true', hasPersonalData='false') + +/* + * Trend Item information for promoted and non-promoted Trends. + */ +struct TrendInfo { + /* + * Identifier for promoted Trends only. + * This is not available for non-promoted Trends and the default value should be set to 0. + */ + 1: required i32 actionTrendId(personalDataType= 'TrendId') + /* + * Empty for promoted Trends only. + * This should be set for all non-promoted Trends. + */ + 2: optional string actionTrendName +}(persisted='true', hasPersonalData='true') + +struct TypeaheadInfo { + /* search query string */ + 1: required string actionQuery(personalDataType = 'SearchQuery') + 2: required TypeaheadActionInfo typeaheadActionInfo +}(persisted='true', hasPersonalData='true') + +union TypeaheadActionInfo { + 1: UserResult userResult + 2: TopicQueryResult topicQueryResult +}(persisted='true', hasPersonalData='true') + +struct UserResult { + /* The userId of the profile suggested in the typeahead drop-down, upon which the user took the action */ + 1: required i64 profileId(personalDataType = 'UserId') +}(persisted='true', hasPersonalData='true') + +struct TopicQueryResult { + /* The topic query name suggested in the typeahead drop-down, upon which the user took the action */ + 1: required string suggestedTopicQuery(personalDataType = 'SearchQuery') +}(persisted='true', hasPersonalData='true') + + + +/* + * Item that captures feedback related information submitted by the user across modules / item (Eg: Search Results / Tweets) + * Design discussion doc: https://docs.google.com/document/d/1UHiCrGzfiXOSymRAUM565KchVLZBAByMwvP4ARxeixY/edit# + */ +struct FeedbackPromptInfo { + 1: required FeedbackPromptActionInfo feedbackPromptActionInfo +}(persisted='true', hasPersonalData='true') + +union FeedbackPromptActionInfo { + 1: DidYouFindItSearch didYouFindItSearch + 2: TweetRelevantToSearch tweetRelevantToSearch +}(persisted='true', hasPersonalData='true') + +struct DidYouFindItSearch { + 1: required string searchQuery(personalDataType= 'SearchQuery') + 2: optional bool isRelevant +}(persisted='true', hasPersonalData='true') + +struct TweetRelevantToSearch { + 1: required string searchQuery(personalDataType= 'SearchQuery') + 2: required i64 tweetId + 3: optional bool isRelevant +}(persisted='true', hasPersonalData='true') + +/* + * For (Tweet) Author info + */ +struct AuthorInfo { + /* In practice, this should be set. Rarely, it may be unset. */ + 1: optional i64 authorId(personalDataType = 'UserId') + /* i.e. in-network (true) or out-of-network (false) */ + 2: optional bool isFollowedByActingUser + /* i.e. is a follower (true) or not (false) */ + 3: optional bool isFollowingActingUser +}(persisted='true', hasPersonalData='true') + +/* + * Use for Call to Action events. + */ +struct CTAInfo { + // this field is just a placeholder since Sparrow doesn't support empty struct + 100: optional bool placeholder +}(persisted='true', hasPersonalData='false') + +/* + * Card Info + */ +struct CardInfo { + 1: optional i64 id + 2: optional client_app.ItemType itemType + // authorId is deprecated, please use AuthorInfo instead + // 3: optional i64 authorId(personalDataType = 'UserId') + 4: optional AuthorInfo actionTweetAuthorInfo +}(persisted='true', hasPersonalData='false') + +/* + * When the user exits the app, the time (in millis) spent by them on the platform is recorded as User Active Seconds (UAS). + */ +struct UASInfo { + 1: required i64 timeSpentMs +}(persisted='true', hasPersonalData='false') + +/* + * Corresponding item for a user action. + * An item should be treated independently if it has different affordances + * (https://www.interaction-design.org/literature/topics/affordances) for the user. + * For example, a Notification has different affordances than a Tweet in the Notification Tab; + * in the former, you can either "click" or "see less often" and in the latter, + * you can perform inline engagements such as "like" or "reply". + * Note that an item may be rendered differently in different contexts, but as long as the + * affordances remain the same or nearly similar, it can be treated as the same item + * (e.g. Tweets can be rendered in slightly different ways in embeds vs in the app). + * Item types (e.g. Tweets, Notifications) and ActionTypes should be 1:1, and when an action can be + * performed on multiple types of items, consider granular action types. + * For example, a user can take the Click action on Tweets and Notifications, and we have + * separate ActionTypes for Tweet Click and Notification Click. This makes it easier to identify all the + * actions associated with a particular item. + */ +union Item { + 1: TweetInfo tweetInfo + 2: ProfileInfo profileInfo + 3: TopicInfo topicInfo + 4: NotificationInfo notificationInfo + 5: TrendInfo trendInfo + 6: CTAInfo ctaInfo + 7: FeedbackPromptInfo feedbackPromptInfo + 8: TypeaheadInfo typeaheadInfo + 9: UASInfo uasInfo + 10: CardInfo cardInfo +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/keyed_uua.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/keyed_uua.thrift new file mode 100644 index 000000000..98c64609c --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/keyed_uua.thrift @@ -0,0 +1,22 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +include "com/twitter/unified_user_actions/action_info.thrift" +include "com/twitter/unified_user_actions/common.thrift" +include "com/twitter/unified_user_actions/metadata.thrift" + +/* + * This is mainly for View Counts project, which only require minimum fields for now. + * The name KeyedUuaTweet indicates the value is about a Tweet, not a Moment or other entities. + */ +struct KeyedUuaTweet { + /* A user refers to either a logged in / logged out user */ + 1: required common.UserIdentifier userIdentifier + /* The tweet that received the action from the user */ + 2: required i64 tweetId(personalDataType='TweetId') + /* The type of action which took place */ + 3: required action_info.ActionType actionType + /* Useful for event level analysis and joins */ + 4: required metadata.EventMetadata eventMetadata +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/metadata.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/metadata.thrift new file mode 100644 index 000000000..47644b6f8 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/metadata.thrift @@ -0,0 +1,177 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +/* Input source */ +enum SourceLineage { + /* Client-side. Also known as legacy client events or LCE. */ + ClientEvents = 0 + /* Client-side. Also known as BCE. */ + BehavioralClientEvents = 1 + /* Server-side Timelineservice favorites */ + ServerTlsFavs = 2 + /* Server-side Tweetypie events */ + ServerTweetypieEvents = 3 + /* Server-side SocialGraph events */ + ServerSocialGraphEvents = 4 + /* Notification Actions responding to Your Highlights Emails */ + EmailNotificationEvents = 5 + /** + * Gizmoduck's User Modification events https://docbird.twitter.biz/gizmoduck/user_modifications.html + **/ + ServerGizmoduckUserModificationEvents = 6 + /** + * Server-side Ads callback engagements + **/ + ServerAdsCallbackEngagements = 7 + /** + * Server-side favorite archival events + **/ + ServerFavoriteArchivalEvents = 8 + /** + * Server-side retweet archival events + **/ + ServerRetweetArchivalEvents = 9 +}(persisted='true', hasPersonalData='false') + +/* + * Only available in behavioral client events (BCE). + * + * A breadcrumb tweet is a tweet that was interacted with prior to the current action. + */ +struct BreadcrumbTweet { + /* Id for the tweet that was interacted with prior to the current action */ + 1: required i64 tweetId(personalDataType = 'TweetId') + /* + * The UI component that hosted the tweet and was interacted with preceeding to the current action. + * - tweet: represents the parent tweet container that wraps the quoted tweet + * - quote_tweet: represents the nested or quoted tweet within the parent container + * + * See more details + * https://docs.google.com/document/d/16CdSRpsmUUd17yoFH9min3nLBqDVawx4DaZoiqSfCHI/edit#heading=h.nb7tnjrhqxpm + */ + 2: required string sourceComponent(personalDataType = 'WebsitePage') +}(persisted='true', hasPersonalData='true') + +/* + * ClientEvent's namespaces. See https://docbird.twitter.biz/client_events/client-event-namespaces.html + * + * - For Legacy Client Events (LCE), it excludes the client part of the + * six part namespace (client:page:section:component:element:action) + * since this part is better captured by clientAppid and clientVersion. + * + * - For Behavioral Client Events (BCE), use clientPlatform to identify the client. + * Additionally, BCE contains an optional subsection to denote the UI component of + * the current action. The ClientEventNamespace.component field will be always empty for + * BCE namespace. There is no straightfoward 1-1 mapping between BCE and LCE namespace. + */ +struct ClientEventNamespace { + 1: optional string page(personalDataType = 'AppUsage') + 2: optional string section(personalDataType = 'AppUsage') + 3: optional string component(personalDataType = 'AppUsage') + 4: optional string element(personalDataType = 'AppUsage') + 5: optional string action(personalDataType = 'AppUsage') + 6: optional string subsection(personalDataType = 'AppUsage') +}(persisted='true', hasPersonalData='true') + +/* + * Metadata that is independent of a particular (user, item, action type) tuple + * and mostly shared across user action events. + */ +struct EventMetadata { + /* When the action happened according to whatever source we are reading from */ + 1: required i64 sourceTimestampMs(personalDataType = 'PrivateTimestamp, PublicTimestamp') + /* When the action was received for processing internally + * (compare with sourceTimestampMs for delay) + */ + 2: required i64 receivedTimestampMs + /* Which source is this event derived, e.g. CE, BCE, TimelineFavs */ + 3: required SourceLineage sourceLineage + /* To be deprecated and replaced by requestJoinId + * Useful for joining with other datasets + * */ + 4: optional i64 traceId(personalDataType = 'TfeTransactionId') + /* + * This is the language inferred from the request of the user action event (typically user's current client language) + * NOT the language of any Tweet, + * NOT the language that user sets in their profile!!! + * + * - ClientEvents && BehavioralClientEvents: Client UI language or from Gizmoduck which is what user set in Twitter App. + * Please see more at https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/LanguageIdentifier.scala + * The format should be ISO 639-1. + * - ServerTlsFavs: Client UI language, see more at http://go/languagepriority. The format should be ISO 639-1. + * - ServerTweetypieEvents: UUA sets this to None since there is no request level language info. + */ + 5: optional string language(personalDataType = 'InferredLanguage') + /* + * This is the country inferred from the request of the user action event (typically user's current country code) + * NOT the country of any Tweet (by geo-tagging), + * NOT the country set by the user in their profile!!! + * + * - ClientEvents && BehavioralClientEvents: Country code could be IP address (geoduck) or + * User registration country (gizmoduck) and the former takes precedence. + * We don’t know exactly which one is applied, unfortunately, + * see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/CountryIdentifier.scala + * The format should be ISO_3166-1_alpha-2. + * - ServerTlsFavs: From the request (user’s current location), + * see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/context/viewer.thrift?L54 + * The format should be ISO_3166-1_alpha-2. + * - ServerTweetypieEvents: + * UUA sets this to be consistent with IESource to meet existing use requirement. + * see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/src/thrift/com/twitter/tweetypie/tweet.thrift?L1001. + * The definitions here conflicts with the intention of UUA to log the request country code + * rather than the signup / geo-tagging country. + */ + 6: optional string countryCode(personalDataType = 'InferredCountry') + /* Useful for debugging client application related issues */ + 7: optional i64 clientAppId(personalDataType = 'AppId') + /* Useful for debugging client application related issues */ + 8: optional string clientVersion(personalDataType = 'ClientVersion') + /* Useful for filtering */ + 9: optional ClientEventNamespace clientEventNamespace + /* + * This field is only populated in behavioral client events (BCE). + * + * The client platform such as one of ["iPhone", "iPad", "Mac", "Android", "Web"] + * There can be multiple clientAppIds for the same platform. + */ + 10: optional string clientPlatform(personalDataType = 'ClientType') + /* + * This field is only populated in behavioral client events (BCE). + * + * The current UI hierarchy information with human readable labels. + * For example, [home,timeline,tweet] or [tab_bar,home,scrollable_content,tweet] + * + * For more details see https://docs.google.com/document/d/16CdSRpsmUUd17yoFH9min3nLBqDVawx4DaZoiqSfCHI/edit#heading=h.uv3md49i0j4j + */ + 11: optional list viewHierarchy(personalDataType = 'WebsitePage') + /* + * This field is only populated in behavioral client events (BCE). + * + * The sequence of views (breadcrumb) that was interacted with that caused the user to navigate to + * the current product surface (e.g. profile page) where an action was taken. + * + * The breadcrumb information may only be present for certain preceding product surfaces (e.g. Home Timeline). + * See more details in https://docs.google.com/document/d/16CdSRpsmUUd17yoFH9min3nLBqDVawx4DaZoiqSfCHI/edit#heading=h.nb7tnjrhqxpm + */ + 12: optional list breadcrumbViews(personalDataType = 'WebsitePage') + /* + * This field is only populated in behavioral client events (BCE). + * + * The sequence of tweets (breadcrumb) that was interacted with that caused the user to navigate to + * current product surface (e.g. profile page) where an action was taken. + * + * The breadcrumb information may only be present for certain preceding product surfaces (e.g. Home Timeline). + * See more details in https://docs.google.com/document/d/16CdSRpsmUUd17yoFH9min3nLBqDVawx4DaZoiqSfCHI/edit#heading=h.nb7tnjrhqxpm + */ + 13: optional list breadcrumbTweets(personalDataType = 'TweetId') + /* + * A request join id is created by backend services and broadcasted in subsequent calls + * to other downstream services as part of the request path. The requestJoinId is logged + * in server logs and scribed in client events, enabling joins across client and server + * as well as within a given request across backend servers. See go/joinkey-tdd for more + * details. + */ + 14: optional i64 requestJoinId(personalDataType = 'TransactionId') + 15: optional i64 clientEventTriggeredOn +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/product_surface_info.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/product_surface_info.thrift new file mode 100644 index 000000000..524097885 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/product_surface_info.thrift @@ -0,0 +1,149 @@ +#@namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +include "com/twitter/unified_user_actions/metadata.thrift" +include "com/twitter/search/common/constants/query.thrift" +include "com/twitter/search/common/constants/result.thrift" + + +/* + * Represents the product surface on which an action took place. + * See reference that delineates various product surfaces: + * https://docs.google.com/document/d/1PS2ZOyNUoUdO45zxhE7dH3L8KUcqJwo6Vx-XUGGFo6U + * Note: the implementation here may not reflect the above doc exactly. + */ +enum ProductSurface { + // 1 - 19 for Home + HomeTimeline = 1 + // 20 - 39 for Notifications + NotificationTab = 20 + PushNotification = 21 + EmailNotification = 22 + // 40 - 59 for Search + SearchResultsPage = 40 + SearchTypeahead = 41 + // 60 - 79 for Tweet Details Page (Conversation Page) + TweetDetailsPage = 60 + // 80 - 99 for Profile Page + ProfilePage = 80 + // 100 - 119 for ? + RESERVED_100 = 100 + // 120 - 139 for ? + RESERVED_120 = 120 +}(persisted='true', hasPersonalData='false') + +union ProductSurfaceInfo { + // 1 matches the enum index HomeTimeline in ProductSurface + 1: HomeTimelineInfo homeTimelineInfo + // 20 matches the enum index NotificationTab in ProductSurface + 20: NotificationTabInfo notificationTabInfo + // 21 matches the enum index PushNotification in ProductSurface + 21: PushNotificationInfo pushNotificationInfo + // 22 matches the enum index EmailNotification in ProductSurface + 22: EmailNotificationInfo emailNotificationInfo + // 40 matches the enum index SearchResultPage in ProductSurface + 40: SearchResultsPageInfo searchResultsPageInfo + // 41 matches the enum index SearchTypeahead in ProductSurface + 41: SearchTypeaheadInfo searchTypeaheadInfo + // 60 matches the enum index TweetDetailsPage in ProductSurface + 60: TweetDetailsPageInfo tweetDetailsPageInfo + // 80 matches the enum index ProfilePage in ProductSurface + 80: ProfilePageInfo profilePageInfo +}(persisted='true', hasPersonalData='false') + +/* + * Please keep this minimal to avoid overhead. It should only + * contain high value Home Timeline specific attributes. + */ +struct HomeTimelineInfo { + // suggestType is deprecated, please do't re-use! + // 1: optional i32 suggestType + 2: optional string suggestionType + 3: optional i32 injectedPosition +}(persisted='true', hasPersonalData='false') + +struct NotificationTabInfo { + /* + * Note that this field represents the `impressionId` in a Notification Tab notification. + * It has been renamed to `notificationId` in UUA so that the name effectively represents the + * value it holds, i.e., a unique id for a notification and request. + */ + 1: required string notificationId(personalDataType='UniversallyUniqueIdentifierUuid') +}(persisted='true', hasPersonalData='false') + +struct PushNotificationInfo { + /* + * Note that this field represents the `impressionId` in a Push Notification. + * It has been renamed to `notificationId` in UUA so that the name effectively represents the + * value it holds, i.e., a unique id for a notification and request. + */ + 1: required string notificationId(personalDataType='UniversallyUniqueIdentifierUuid') +}(persisted='true', hasPersonalData='false') + +struct EmailNotificationInfo { + /* + * Note that this field represents the `impressionId` in an Email Notification. + * It has been renamed to `notificationId` in UUA so that the name effectively represents the + * value it holds, i.e., a unique id for a notification and request. + */ + 1: required string notificationId(personalDataType='UniversallyUniqueIdentifierUuid') +}(persisted='true', hasPersonalData='false') + + +struct TweetDetailsPageInfo { + // To be deprecated, please don't re-use! + // Only reason to keep it now is Sparrow doesn't take empty struct. Once there is a real + // field we should just comment it out. + 1: required list breadcrumbViews(personalDataType = 'WebsitePage') + // Deprecated, please don't re-use! + // 2: required list breadcrumbTweets(personalDataType = 'TweetId') +}(persisted='true', hasPersonalData='true') + +struct ProfilePageInfo { + // To be deprecated, please don't re-use! + // Only reason to keep it now is Sparrow doesn't take empty struct. Once there is a real + // field we should just comment it out. + 1: required list breadcrumbViews(personalDataType = 'WebsitePage') + // Deprecated, please don't re-use! + // 2: required list breadcrumbTweets(personalDataType = 'TweetId') +}(persisted='true', hasPersonalData='true') + +struct SearchResultsPageInfo { + // search query string + 1: required string query(personalDataType = 'SearchQuery') + // Attribution of the search (e.g. Typed Query, Hashtag Click, etc.) + // see http://go/sgb/src/thrift/com/twitter/search/common/constants/query.thrift for details + 2: optional query.ThriftQuerySource querySource + // 0-indexed position of item in list of search results + 3: optional i32 itemPosition + // Attribution of the tweet result (e.g. QIG, Earlybird, etc) + // see http://go/sgb/src/thrift/com/twitter/search/common/constants/result.thrift for details + 4: optional set tweetResultSources + // Attribution of the user result (e.g. ExpertSearch, QIG, etc) + // see http://go/sgb/src/thrift/com/twitter/search/common/constants/result.thrift for details + 5: optional set userResultSources + // The query filter type on the Search Results Page (SRP) when the action took place. + // Clicking on a tab in SRP applies a query filter automatically. + 6: optional SearchQueryFilterType queryFilterType +}(persisted='true', hasPersonalData='true') + +struct SearchTypeaheadInfo { + // search query string + 1: required string query(personalDataType = 'SearchQuery') + // 0-indexed position of item in list of typeahead drop-down + 2: optional i32 itemPosition +}(persisted='true', hasPersonalData='true') + +enum SearchQueryFilterType { + // filter to top ranked content for a query + TOP = 1 + // filter to latest content for a query + LATEST = 2 + // filter to user results for a query + PEOPLE = 3 + // filter to photo tweet results for a query + PHOTOS = 4 + // filter to video tweet results for a query + VIDEOS = 5 +} diff --git a/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift new file mode 100644 index 000000000..d1a073b03 --- /dev/null +++ b/unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift @@ -0,0 +1,37 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +include "com/twitter/unified_user_actions/action_info.thrift" +include "com/twitter/unified_user_actions/common.thrift" +include "com/twitter/unified_user_actions/item.thrift" +include "com/twitter/unified_user_actions/metadata.thrift" +include "com/twitter/unified_user_actions/product_surface_info.thrift" + +/* + * A Unified User Action (UUA) is essentially a tuple of + * (user, item, action type, some metadata) with more optional + * information unique to product surfaces when available. + * It represents a user (logged in / out) taking some action (e.g. engagement, + * impression) on an item (e.g. tweet, profile). + */ +struct UnifiedUserAction { + /* A user refers to either a logged in / logged out user */ + 1: required common.UserIdentifier userIdentifier + /* The item that received the action from the user */ + 2: required item.Item item + /* The type of action which took place */ + 3: required action_info.ActionType actionType + /* Useful for event level analysis and joins */ + 4: required metadata.EventMetadata eventMetadata + /* + * Product surface on which the action occurred. If None, + * it means we can not capture the product surface (e.g. for server-side events). + */ + 5: optional product_surface_info.ProductSurface productSurface + /* + * Product specific information like join keys. If None, + * it means we can not capture the product surface information. + */ + 6: optional product_surface_info.ProductSurfaceInfo productSurfaceInfo +}(persisted='true', hasPersonalData='true') diff --git a/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/BUILD.bazel b/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/BUILD.bazel new file mode 100644 index 000000000..5295c7ead --- /dev/null +++ b/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/BUILD.bazel @@ -0,0 +1,15 @@ +create_thrift_libraries( + org = "com.twitter", + base_name = "unified_user_actions_spec", + sources = ["*.thrift"], + tags = ["bazel-compatible"], + dependency_roots = [ + ], + generate_languages = [ + "java", + "scala", + "strato", + ], + provides_java_name = "unified_user_actions_spec-thrift-java", + provides_scala_name = "unified_user_actions_spec-thrift-scala", +) diff --git a/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift b/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift new file mode 100644 index 000000000..5ab129aaf --- /dev/null +++ b/unified_user_actions/thrift/src/test/thrift/com/twitter/unified_user_actions/unified_user_actions.thrift @@ -0,0 +1,11 @@ +namespace java com.twitter.unified_user_actions.thriftjava +#@namespace scala com.twitter.unified_user_actions.thriftscala +#@namespace strato com.twitter.unified_user_actions + +/* Useful for testing UnifiedUserAction-like schema in tests */ +struct UnifiedUserActionSpec { + /* A user refers to either a logged out / logged in user */ + 1: required i64 userId + /* Arbitrary payload */ + 2: optional string payload +}(hasPersonalData='false') diff --git a/user-signal-service/README.md b/user-signal-service/README.md new file mode 100644 index 000000000..d30568cf4 --- /dev/null +++ b/user-signal-service/README.md @@ -0,0 +1,5 @@ +# User Signal Service # + +**User Signal Service** (USS) is a centralized online platform that supplies comprehensive data on user actions and behaviors on Twitter. This information encompasses both explicit signals, such as favoriting, retweeting, and replying, as well as implicit signals, including tweet clicks, video views, profile visits, and more. + +To ensure consistency and accuracy, USS gathers these signals from various underlying datasets and online services, processing them into uniform formats. These standardized source signals are then utilized in candidate retrieval and machine learning features for ranking stages. \ No newline at end of file diff --git a/user-signal-service/server/BUILD b/user-signal-service/server/BUILD new file mode 100644 index 000000000..76ff96764 --- /dev/null +++ b/user-signal-service/server/BUILD @@ -0,0 +1,21 @@ +jvm_binary( + name = "bin", + basename = "user-signal-service", + main = "com.twitter.usersignalservice.UserSignalServiceStratoFedServerMain", + runtime_platform = "java11", + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback", + "strato/src/main/scala/com/twitter/strato/logging/logback", + "user-signal-service/server/src/main/resources", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice", + ], +) + +# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app +jvm_app( + name = "user-signal-service-app", + archive = "zip", + binary = ":bin", +) diff --git a/user-signal-service/server/src/main/resources/BUILD b/user-signal-service/server/src/main/resources/BUILD new file mode 100644 index 000000000..b35d9c9d4 --- /dev/null +++ b/user-signal-service/server/src/main/resources/BUILD @@ -0,0 +1,7 @@ +resources( + sources = [ + "*.xml", + "*.yml", + "config/*.yml", + ], +) diff --git a/user-signal-service/server/src/main/resources/config/decider.yml b/user-signal-service/server/src/main/resources/config/decider.yml new file mode 100644 index 000000000..f22a9dc22 --- /dev/null +++ b/user-signal-service/server/src/main/resources/config/decider.yml @@ -0,0 +1,6 @@ +test_value: + comment: Test Value + default_availability: 10000 +dark_traffic_percent: + comment: Percentage of traffic to send to dark traffic destination + default_availability: 0 \ No newline at end of file diff --git a/user-signal-service/server/src/main/resources/logback.xml b/user-signal-service/server/src/main/resources/logback.xml new file mode 100644 index 000000000..6511278df --- /dev/null +++ b/user-signal-service/server/src/main/resources/logback.xml @@ -0,0 +1,155 @@ + + + + + + + + + + + + + + + + + + + + + + true + + + + + ${log.service.output} + + ${log.service.output}.%i + 1 + 10 + + + 50MB + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + ${log.strato_only.output} + + ${log.strato_only.output}.%i + 1 + 10 + + + 50MB + + + %date %.-3level ${DEFAULT_SERVICE_PATTERN}%n + + + + + + true + loglens + ${log.lens.index} + ${log.lens.tag}/service + + %msg%n + + + 500 + 50 + + + manhattan-client + .*InvalidRequest.* + + + + + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + ${async_queue_size} + ${async_max_flush_time} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/BUILD new file mode 100644 index 000000000..248fff64b --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/BUILD @@ -0,0 +1,9 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/UserSignalServiceStratoFedServerMain.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/UserSignalServiceStratoFedServerMain.scala new file mode 100644 index 000000000..878310abb --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/UserSignalServiceStratoFedServerMain.scala @@ -0,0 +1,32 @@ +package com.twitter.usersignalservice + +import com.google.inject.Module +import com.twitter.inject.thrift.modules.ThriftClientIdModule +import com.twitter.usersignalservice.columns.UserSignalServiceColumn +import com.twitter.strato.fed._ +import com.twitter.strato.fed.server._ +import com.twitter.usersignalservice.module.CacheModule +import com.twitter.usersignalservice.module.MHMtlsParamsModule +import com.twitter.usersignalservice.module.SocialGraphServiceClientModule +import com.twitter.usersignalservice.module.TimerModule + +object UserSignalServiceStratoFedServerMain extends UserSignalServiceStratoFedServer + +trait UserSignalServiceStratoFedServer extends StratoFedServer { + override def dest: String = "/s/user-signal-service/user-signal-service" + + override def columns: Seq[Class[_ <: StratoFed.Column]] = + Seq( + classOf[UserSignalServiceColumn] + ) + + override def modules: Seq[Module] = + Seq( + CacheModule, + MHMtlsParamsModule, + SocialGraphServiceClientModule, + ThriftClientIdModule, + TimerModule, + ) + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/AggregatedSignalController.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/AggregatedSignalController.scala new file mode 100644 index 000000000..fb698b01a --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/AggregatedSignalController.scala @@ -0,0 +1,58 @@ +package com.twitter.usersignalservice.base + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.base.Stats +import com.twitter.storehaus.ReadableStore +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.BaseSignalFetcher.Timeout +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer + +case class AggregatedSignalController( + signalsAggregationInfo: Seq[SignalAggregatedInfo], + signalsWeightMapInfo: Map[SignalType, Double], + stats: StatsReceiver, + timer: Timer) + extends ReadableStore[Query, Seq[Signal]] { + + val name: String = this.getClass.getCanonicalName + val statsReceiver: StatsReceiver = stats.scope(name) + + override def get(query: Query): Future[Option[Seq[Signal]]] = { + Stats + .trackItems(statsReceiver) { + val allSignalsFut = + Future + .collect(signalsAggregationInfo.map(_.getSignals(query.userId))).map(_.flatten.flatten) + val aggregatedSignals = + allSignalsFut.map { allSignals => + allSignals + .groupBy(_.targetInternalId).collect { + case (Some(internalId), signals) => + val mostRecentEnagementTime = signals.map(_.timestamp).max + val totalWeight = + signals + .map(signal => signalsWeightMapInfo.getOrElse(signal.signalType, 0.0)).sum + (Signal(query.signalType, mostRecentEnagementTime, Some(internalId)), totalWeight) + }.toSeq.sortBy { case (signal, weight) => (-weight, -signal.timestamp) } + .map(_._1) + .take(query.maxResults.getOrElse(Int.MaxValue)) + } + aggregatedSignals.map(Some(_)) + }.raiseWithin(Timeout)(timer).handle { + case e => + statsReceiver.counter(e.getClass.getCanonicalName).incr() + Some(Seq.empty[Signal]) + } + } +} + +case class SignalAggregatedInfo( + signalType: SignalType, + signalFetcher: ReadableStore[Query, Seq[Signal]]) { + def getSignals(userId: UserId): Future[Option[Seq[Signal]]] = { + signalFetcher.get(Query(userId, signalType, None)) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BUILD new file mode 100644 index 000000000..83bb0aa3e --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BUILD @@ -0,0 +1,16 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/src/jvm/com/twitter/storehaus:core", + "finagle/finagle-stats", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection", + "src/scala/com/twitter/storehaus_internal/manhattan", + "src/scala/com/twitter/twistly/common", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BaseSignalFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BaseSignalFetcher.scala new file mode 100644 index 000000000..27646b9cc --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/BaseSignalFetcher.scala @@ -0,0 +1,90 @@ +package com.twitter.usersignalservice +package base + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.storehaus.ReadableStore +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.frigate.common.base.Stats +import com.twitter.conversions.DurationOps._ +import com.twitter.usersignalservice.thriftscala.ClientIdentifier +import com.twitter.util.Duration +import com.twitter.util.Timer +import java.io.Serializable + +case class Query( + userId: UserId, + signalType: SignalType, + maxResults: Option[Int], + clientId: ClientIdentifier = ClientIdentifier.Unknown) + +/** + * A trait that defines a standard interface for the signal fetcher + * + * Extends this only when all other traits extending BaseSignalFetcher do not apply to + * your use case. + */ +trait BaseSignalFetcher extends ReadableStore[Query, Seq[Signal]] { + import BaseSignalFetcher._ + + /** + * This RawSignalType is the output type of `getRawSignals` and the input type of `process`. + * Override it as your own raw signal type to maintain meta data which can be used in the + * step of `process`. + * Note that the RawSignalType is an intermediate data type intended to be small to avoid + * big data chunks being passed over functions or being memcached. + */ + type RawSignalType <: Serializable + + def name: String + def statsReceiver: StatsReceiver + def timer: Timer + + /** + * This function is called by the top level class to fetch signals. It executes the pipeline to + * fetch raw signals, process and transform the signals. Exceptions and timeout control are + * handled here. + * @param query + * @return Future[Option[Seq[Signal]]] + */ + override def get(query: Query): Future[Option[Seq[Signal]]] = { + val clientStatsReceiver = statsReceiver.scope(query.clientId.name).scope(query.signalType.name) + Stats + .trackItems(clientStatsReceiver) { + val rawSignals = getRawSignals(query.userId) + val signals = process(query, rawSignals) + signals + }.raiseWithin(Timeout)(timer).handle { + case e => + clientStatsReceiver.scope("FetcherExceptions").counter(e.getClass.getCanonicalName).incr() + EmptyResponse + } + } + + /** + * Override this function to define how to fetch the raw signals from any store + * Note that the RawSignalType is an intermediate data type intended to be small to avoid + * big data chunks being passed over functions or being memcached. + * @param userId + * @return Future[Option[Seq[RawSignalType]]] + */ + def getRawSignals(userId: UserId): Future[Option[Seq[RawSignalType]]] + + /** + * Override this function to define how to process the raw signals and transform them to signals. + * @param query + * @param rawSignals + * @return Future[Option[Seq[Signal]]] + */ + def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] +} + +object BaseSignalFetcher { + val Timeout: Duration = 20.milliseconds + val EmptyResponse: Option[Seq[Signal]] = Some(Seq.empty[Signal]) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/FilteredSignalFetcherController.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/FilteredSignalFetcherController.scala new file mode 100644 index 000000000..e2e0e96fe --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/FilteredSignalFetcherController.scala @@ -0,0 +1,75 @@ +package com.twitter.usersignalservice.base + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.base.Stats +import com.twitter.storehaus.ReadableStore +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer + +/** + * Combine a BaseSignalFetcher with a map of negative signalFetchers. Filter out the negative + * signals from the signals from BaseSignalFetcher. + */ +case class FilteredSignalFetcherController( + backingSignalFetcher: BaseSignalFetcher, + originSignalType: SignalType, + stats: StatsReceiver, + timer: Timer, + filterSignalFetchers: Map[SignalType, BaseSignalFetcher] = + Map.empty[SignalType, BaseSignalFetcher]) + extends ReadableStore[Query, Seq[Signal]] { + val statsReceiver: StatsReceiver = stats.scope(this.getClass.getCanonicalName) + + override def get(query: Query): Future[Option[Seq[Signal]]] = { + val clientStatsReceiver = statsReceiver.scope(query.signalType.name).scope(query.clientId.name) + Stats + .trackItems(clientStatsReceiver) { + val backingSignals = + backingSignalFetcher.get(Query(query.userId, originSignalType, None, query.clientId)) + val filteredSignals = filter(query, backingSignals) + filteredSignals + }.raiseWithin(BaseSignalFetcher.Timeout)(timer).handle { + case e => + clientStatsReceiver.scope("FetcherExceptions").counter(e.getClass.getCanonicalName).incr() + BaseSignalFetcher.EmptyResponse + } + } + + def filter( + query: Query, + rawSignals: Future[Option[Seq[Signal]]] + ): Future[Option[Seq[Signal]]] = { + Stats + .trackItems(statsReceiver) { + val originSignals = rawSignals.map(_.getOrElse(Seq.empty[Signal])) + val filterSignals = + Future + .collect { + filterSignalFetchers.map { + case (signalType, signalFetcher) => + signalFetcher + .get(Query(query.userId, signalType, None, query.clientId)) + .map(_.getOrElse(Seq.empty)) + }.toSeq + }.map(_.flatten.toSet) + val filterSignalsSet = filterSignals + .map(_.flatMap(_.targetInternalId)) + + val originSignalsWithId = + originSignals.map(_.map(signal => (signal, signal.targetInternalId))) + Future.join(originSignalsWithId, filterSignalsSet).map { + case (originSignalsWithId, filterSignalsSet) => + Some( + originSignalsWithId + .collect { + case (signal, internalIdOpt) + if internalIdOpt.nonEmpty && !filterSignalsSet.contains(internalIdOpt.get) => + signal + }.take(query.maxResults.getOrElse(Int.MaxValue))) + } + } + } + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/ManhattanSignalFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/ManhattanSignalFetcher.scala new file mode 100644 index 000000000..d0918a165 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/ManhattanSignalFetcher.scala @@ -0,0 +1,66 @@ +package com.twitter.usersignalservice +package base + +import com.twitter.bijection.Codec +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus.ReadableStore +import com.twitter.storehaus_internal.manhattan.ManhattanCluster +import com.twitter.storehaus_internal.manhattan.ManhattanRO +import com.twitter.storehaus_internal.manhattan.ManhattanROConfig +import com.twitter.storehaus_internal.util.HDFSPath +import com.twitter.twistly.common.UserId +import com.twitter.util.Future +import com.twitter.storehaus_internal.util.ApplicationID +import com.twitter.storehaus_internal.util.DatasetName + +/** + * A Manhattan signal fetcher extending BaseSignalFetcher to provide an interface to fetch signals + * from a Manhattan dataset. + * + * Extends this when the underlying store is a single Manhattan dataset. + * @tparam ManhattanKeyType + * @tparam ManhattanValueType + */ +trait ManhattanSignalFetcher[ManhattanKeyType, ManhattanValueType] extends BaseSignalFetcher { + /* + Define the meta info of the Manhattan dataset + */ + protected def manhattanAppId: String + protected def manhattanDatasetName: String + protected def manhattanClusterId: ManhattanCluster + protected def manhattanKVClientMtlsParams: ManhattanKVClientMtlsParams + + protected def manhattanKeyCodec: Codec[ManhattanKeyType] + protected def manhattanRawSignalCodec: Codec[ManhattanValueType] + + /** + * Adaptor to transform the userId to the ManhattanKey + * @param userId + * @return ManhattanKeyType + */ + protected def toManhattanKey(userId: UserId): ManhattanKeyType + + /** + * Adaptor to transform the ManhattanValue to the Seq of RawSignalType + * @param manhattanValue + * @return Seq[RawSignalType] + */ + protected def toRawSignals(manhattanValue: ManhattanValueType): Seq[RawSignalType] + + protected final lazy val underlyingStore: ReadableStore[UserId, Seq[RawSignalType]] = { + ManhattanRO + .getReadableStoreWithMtls[ManhattanKeyType, ManhattanValueType]( + ManhattanROConfig( + HDFSPath(""), + ApplicationID(manhattanAppId), + DatasetName(manhattanDatasetName), + manhattanClusterId), + manhattanKVClientMtlsParams + )(manhattanKeyCodec, manhattanRawSignalCodec) + .composeKeyMapping(userId => toManhattanKey(userId)) + .mapValues(manhattanRawSignal => toRawSignals(manhattanRawSignal)) + } + + override final def getRawSignals(userId: UserId): Future[Option[Seq[RawSignalType]]] = + underlyingStore.get(userId) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/MemcachedSignalFetcherWrapper.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/MemcachedSignalFetcherWrapper.scala new file mode 100644 index 000000000..4022d9021 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/MemcachedSignalFetcherWrapper.scala @@ -0,0 +1,70 @@ +package com.twitter.usersignalservice +package base + +import com.twitter.finagle.memcached.{Client => MemcachedClient} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.hashing.KeyHasher +import com.twitter.hermit.store.common.ObservedMemcachedReadableStore +import com.twitter.relevance_platform.common.injection.LZ4Injection +import com.twitter.relevance_platform.common.injection.SeqObjectInjection +import com.twitter.storehaus.ReadableStore +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.Timer + +/** + * Use this wrapper when the latency of the signal fetcher is too high (see BaseSignalFetcher.Timeout + * ) and the results from the signal fetcher don't change often (e.g. results are generated from a + * scalding job scheduled each day). + * @param memcachedClient + * @param baseSignalFetcher + * @param ttl + * @param stats + * @param timer + */ +case class MemcachedSignalFetcherWrapper( + memcachedClient: MemcachedClient, + baseSignalFetcher: BaseSignalFetcher, + ttl: Duration, + stats: StatsReceiver, + keyPrefix: String, + timer: Timer) + extends BaseSignalFetcher { + import MemcachedSignalFetcherWrapper._ + override type RawSignalType = baseSignalFetcher.RawSignalType + + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name).scope(baseSignalFetcher.name) + + val underlyingStore: ReadableStore[UserId, Seq[RawSignalType]] = { + val cacheUnderlyingStore = new ReadableStore[UserId, Seq[RawSignalType]] { + override def get(userId: UserId): Future[Option[Seq[RawSignalType]]] = + baseSignalFetcher.getRawSignals(userId) + } + ObservedMemcachedReadableStore.fromCacheClient( + backingStore = cacheUnderlyingStore, + cacheClient = memcachedClient, + ttl = ttl)( + valueInjection = LZ4Injection.compose(SeqObjectInjection[RawSignalType]()), + statsReceiver = statsReceiver, + keyToString = { k: UserId => + s"$keyPrefix:${keyHasher.hashKey(k.toString.getBytes)}" + } + ) + } + + override def getRawSignals(userId: UserId): Future[Option[Seq[RawSignalType]]] = + underlyingStore.get(userId) + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = baseSignalFetcher.process(query, rawSignals) + +} + +object MemcachedSignalFetcherWrapper { + private val keyHasher: KeyHasher = KeyHasher.FNV1A_64 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/StratoSignalFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/StratoSignalFetcher.scala new file mode 100644 index 000000000..2d0de84b6 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base/StratoSignalFetcher.scala @@ -0,0 +1,61 @@ +package com.twitter.usersignalservice +package base +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.twistly.common.UserId +import com.twitter.util.Future + +/** + * A Strato signal fetcher extending BaseSignalFetcher to provide an interface to fetch signals from + * Strato Column. + * + * Extends this when the underlying store is a single Strato column. + * @tparam StratoKeyType + * @tparam StratoViewType + * @tparam StratoValueType + */ +trait StratoSignalFetcher[StratoKeyType, StratoViewType, StratoValueType] + extends BaseSignalFetcher { + /* + Define the meta info of the strato column + */ + def stratoClient: Client + def stratoColumnPath: String + def stratoView: StratoViewType + + /** + * Override these vals and remove the implicit key words. + * @return + */ + protected implicit def keyConv: Conv[StratoKeyType] + protected implicit def viewConv: Conv[StratoViewType] + protected implicit def valueConv: Conv[StratoValueType] + + /** + * Adapter to transform the userId to the StratoKeyType + * @param userId + * @return StratoKeyType + */ + protected def toStratoKey(userId: UserId): StratoKeyType + + /** + * Adapter to transform the StratoValueType to a Seq of RawSignalType + * @param stratoValue + * @return Seq[RawSignalType] + */ + protected def toRawSignals(stratoValue: StratoValueType): Seq[RawSignalType] + + protected final lazy val underlyingStore: ReadableStore[UserId, Seq[RawSignalType]] = + StratoFetchableStore + .withView[StratoKeyType, StratoViewType, StratoValueType]( + stratoClient, + stratoColumnPath, + stratoView) + .composeKeyMapping(toStratoKey) + .mapValues(toRawSignals) + + override final def getRawSignals(userId: UserId): Future[Option[Seq[RawSignalType]]] = + underlyingStore.get(userId) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/BUILD new file mode 100644 index 000000000..1cb85f732 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/BUILD @@ -0,0 +1,11 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "src/scala/com/twitter/twistly/common", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/UserSignalServiceColumn.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/UserSignalServiceColumn.scala new file mode 100644 index 000000000..aea92ecd1 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/columns/UserSignalServiceColumn.scala @@ -0,0 +1,49 @@ +package com.twitter.usersignalservice.columns + +import com.twitter.stitch.NotFound +import com.twitter.stitch.Stitch +import com.twitter.strato.catalog.OpMetadata +import com.twitter.strato.catalog.Ops +import com.twitter.strato.config.Policy +import com.twitter.strato.config.ReadWritePolicy +import com.twitter.strato.data.Conv +import com.twitter.strato.data.Description +import com.twitter.strato.data.Lifecycle +import com.twitter.strato.fed.StratoFed +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.usersignalservice.service.UserSignalService +import com.twitter.usersignalservice.thriftscala.BatchSignalRequest +import com.twitter.usersignalservice.thriftscala.BatchSignalResponse +import javax.inject.Inject + +class UserSignalServiceColumn @Inject() (userSignalService: UserSignalService) + extends StratoFed.Column(UserSignalServiceColumn.Path) + with StratoFed.Fetch.Stitch { + + override val metadata: OpMetadata = OpMetadata( + lifecycle = Some(Lifecycle.Production), + description = Some(Description.PlainText("User Signal Service Federated Column"))) + + override def ops: Ops = super.ops + + override type Key = BatchSignalRequest + override type View = Unit + override type Value = BatchSignalResponse + + override val keyConv: Conv[Key] = ScroogeConv.fromStruct[BatchSignalRequest] + override val viewConv: Conv[View] = Conv.ofType + override val valueConv: Conv[Value] = ScroogeConv.fromStruct[BatchSignalResponse] + + override def fetch(key: Key, view: View): Stitch[Result[Value]] = { + userSignalService + .userSignalServiceHandlerStoreStitch(key) + .map(result => found(result)) + .handle { + case NotFound => missing + } + } +} + +object UserSignalServiceColumn { + val Path = "recommendations/user-signal-service/signals" +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/BUILD new file mode 100644 index 000000000..cca1bf2e0 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/BUILD @@ -0,0 +1,9 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/SignalFetcherConfig.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/SignalFetcherConfig.scala new file mode 100644 index 000000000..f7238edcc --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config/SignalFetcherConfig.scala @@ -0,0 +1,253 @@ +package com.twitter.usersignalservice.config + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.memcached.{Client => MemcachedClient} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.storehaus.ReadableStore +import com.twitter.usersignalservice.base.BaseSignalFetcher +import com.twitter.usersignalservice.base.AggregatedSignalController +import com.twitter.usersignalservice.base.FilteredSignalFetcherController +import com.twitter.usersignalservice.base.MemcachedSignalFetcherWrapper +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.SignalAggregatedInfo +import com.twitter.usersignalservice.signals.AccountBlocksFetcher +import com.twitter.usersignalservice.signals.AccountFollowsFetcher +import com.twitter.usersignalservice.signals.AccountMutesFetcher +import com.twitter.usersignalservice.signals.NotificationOpenAndClickFetcher +import com.twitter.usersignalservice.signals.OriginalTweetsFetcher +import com.twitter.usersignalservice.signals.ProfileVisitsFetcher +import com.twitter.usersignalservice.signals.ProfileClickFetcher +import com.twitter.usersignalservice.signals.RealGraphOonFetcher +import com.twitter.usersignalservice.signals.ReplyTweetsFetcher +import com.twitter.usersignalservice.signals.RetweetsFetcher +import com.twitter.usersignalservice.signals.TweetClickFetcher +import com.twitter.usersignalservice.signals.TweetFavoritesFetcher +import com.twitter.usersignalservice.signals.TweetSharesFetcher +import com.twitter.usersignalservice.signals.VideoTweetsPlayback50Fetcher +import com.twitter.usersignalservice.signals.VideoTweetsQualityViewFetcher +import com.twitter.usersignalservice.signals.NegativeEngagedUserFetcher +import com.twitter.usersignalservice.signals.NegativeEngagedTweetFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +class SignalFetcherConfig @Inject() ( + notificationOpenAndClickFetcher: NotificationOpenAndClickFetcher, + accountFollowsFetcher: AccountFollowsFetcher, + profileVisitsFetcher: ProfileVisitsFetcher, + tweetFavoritesFetcher: TweetFavoritesFetcher, + retweetsFetcher: RetweetsFetcher, + replyTweetsFetcher: ReplyTweetsFetcher, + originalTweetsFetcher: OriginalTweetsFetcher, + tweetSharesFetcher: TweetSharesFetcher, + memcachedClient: MemcachedClient, + realGraphOonFetcher: RealGraphOonFetcher, + tweetClickFetcher: TweetClickFetcher, + videoTweetsPlayback50Fetcher: VideoTweetsPlayback50Fetcher, + videoTweetsQualityViewFetcher: VideoTweetsQualityViewFetcher, + accountMutesFetcher: AccountMutesFetcher, + accountBlocksFetcher: AccountBlocksFetcher, + profileClickFetcher: ProfileClickFetcher, + negativeEngagedTweetFetcher: NegativeEngagedTweetFetcher, + negativeEngagedUserFetcher: NegativeEngagedUserFetcher, + statsReceiver: StatsReceiver, + timer: Timer) { + + val MemcachedProfileVisitsFetcher: BaseSignalFetcher = + MemcachedSignalFetcherWrapper( + memcachedClient, + profileVisitsFetcher, + ttl = 8.hours, + statsReceiver, + keyPrefix = "uss:pv", + timer) + + val MemcachedAccountFollowsFetcher: BaseSignalFetcher = MemcachedSignalFetcherWrapper( + memcachedClient, + accountFollowsFetcher, + ttl = 5.minute, + statsReceiver, + keyPrefix = "uss:af", + timer) + + val GoodTweetClickDdgFetcher: SignalType => FilteredSignalFetcherController = signalType => + FilteredSignalFetcherController( + tweetClickFetcher, + signalType, + statsReceiver, + timer, + Map(SignalType.NegativeEngagedTweetId -> negativeEngagedTweetFetcher) + ) + + val GoodProfileClickDdgFetcher: SignalType => FilteredSignalFetcherController = signalType => + FilteredSignalFetcherController( + profileClickFetcher, + signalType, + statsReceiver, + timer, + Map(SignalType.NegativeEngagedUserId -> negativeEngagedUserFetcher) + ) + + val GoodProfileClickDdgFetcherWithBlocksMutes: SignalType => FilteredSignalFetcherController = + signalType => + FilteredSignalFetcherController( + profileClickFetcher, + signalType, + statsReceiver, + timer, + Map( + SignalType.NegativeEngagedUserId -> negativeEngagedUserFetcher, + SignalType.AccountMute -> accountMutesFetcher, + SignalType.AccountBlock -> accountBlocksFetcher + ) + ) + + val realGraphOonFilteredFetcher: FilteredSignalFetcherController = + FilteredSignalFetcherController( + realGraphOonFetcher, + SignalType.RealGraphOon, + statsReceiver, + timer, + Map( + SignalType.NegativeEngagedUserId -> negativeEngagedUserFetcher + ) + ) + + val videoTweetsQualityViewFilteredFetcher: FilteredSignalFetcherController = + FilteredSignalFetcherController( + videoTweetsQualityViewFetcher, + SignalType.VideoView90dQualityV1, + statsReceiver, + timer, + Map(SignalType.NegativeEngagedTweetId -> negativeEngagedTweetFetcher) + ) + + val videoTweetsPlayback50FilteredFetcher: FilteredSignalFetcherController = + FilteredSignalFetcherController( + videoTweetsPlayback50Fetcher, + SignalType.VideoView90dPlayback50V1, + statsReceiver, + timer, + Map(SignalType.NegativeEngagedTweetId -> negativeEngagedTweetFetcher) + ) + + val uniformTweetSignalInfo: Seq[SignalAggregatedInfo] = Seq( + SignalAggregatedInfo(SignalType.TweetFavorite, tweetFavoritesFetcher), + SignalAggregatedInfo(SignalType.Retweet, retweetsFetcher), + SignalAggregatedInfo(SignalType.Reply, replyTweetsFetcher), + SignalAggregatedInfo(SignalType.OriginalTweet, originalTweetsFetcher), + SignalAggregatedInfo(SignalType.TweetShareV1, tweetSharesFetcher), + SignalAggregatedInfo(SignalType.VideoView90dQualityV1, videoTweetsQualityViewFilteredFetcher), + ) + + val uniformProducerSignalInfo: Seq[SignalAggregatedInfo] = Seq( + SignalAggregatedInfo(SignalType.AccountFollow, MemcachedAccountFollowsFetcher), + SignalAggregatedInfo( + SignalType.RepeatedProfileVisit90dMinVisit6V1, + MemcachedProfileVisitsFetcher), + ) + + val memcachedAccountBlocksFetcher: MemcachedSignalFetcherWrapper = MemcachedSignalFetcherWrapper( + memcachedClient, + accountBlocksFetcher, + ttl = 5.minutes, + statsReceiver, + keyPrefix = "uss:ab", + timer) + + val memcachedAccountMutesFetcher: MemcachedSignalFetcherWrapper = MemcachedSignalFetcherWrapper( + memcachedClient, + accountMutesFetcher, + ttl = 5.minutes, + statsReceiver, + keyPrefix = "uss:am", + timer) + + val SignalFetcherMapper: Map[SignalType, ReadableStore[Query, Seq[Signal]]] = Map( + /* Raw Signals */ + SignalType.AccountFollow -> accountFollowsFetcher, + SignalType.AccountFollowWithDelay -> MemcachedAccountFollowsFetcher, + SignalType.GoodProfileClick -> GoodProfileClickDdgFetcher(SignalType.GoodProfileClick), + SignalType.GoodProfileClick20s -> GoodProfileClickDdgFetcher(SignalType.GoodProfileClick20s), + SignalType.GoodProfileClick30s -> GoodProfileClickDdgFetcher(SignalType.GoodProfileClick30s), + SignalType.GoodProfileClickFiltered -> GoodProfileClickDdgFetcherWithBlocksMutes( + SignalType.GoodProfileClick), + SignalType.GoodProfileClick20sFiltered -> GoodProfileClickDdgFetcherWithBlocksMutes( + SignalType.GoodProfileClick20s), + SignalType.GoodProfileClick30sFiltered -> GoodProfileClickDdgFetcherWithBlocksMutes( + SignalType.GoodProfileClick30s), + SignalType.GoodTweetClick -> GoodTweetClickDdgFetcher(SignalType.GoodTweetClick), + SignalType.GoodTweetClick5s -> GoodTweetClickDdgFetcher(SignalType.GoodTweetClick5s), + SignalType.GoodTweetClick10s -> GoodTweetClickDdgFetcher(SignalType.GoodTweetClick10s), + SignalType.GoodTweetClick30s -> GoodTweetClickDdgFetcher(SignalType.GoodTweetClick30s), + SignalType.NegativeEngagedTweetId -> negativeEngagedTweetFetcher, + SignalType.NegativeEngagedUserId -> negativeEngagedUserFetcher, + SignalType.NotificationOpenAndClickV1 -> notificationOpenAndClickFetcher, + SignalType.OriginalTweet -> originalTweetsFetcher, + SignalType.OriginalTweet90dV2 -> originalTweetsFetcher, + SignalType.RealGraphOon -> realGraphOonFilteredFetcher, + SignalType.RepeatedProfileVisit14dMinVisit2V1 -> MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit14dMinVisit2V1NoNegative -> FilteredSignalFetcherController( + MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit14dMinVisit2V1NoNegative, + statsReceiver, + timer, + Map( + SignalType.AccountMute -> accountMutesFetcher, + SignalType.AccountBlock -> accountBlocksFetcher) + ), + SignalType.RepeatedProfileVisit90dMinVisit6V1 -> MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit90dMinVisit6V1NoNegative -> FilteredSignalFetcherController( + MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit90dMinVisit6V1NoNegative, + statsReceiver, + timer, + Map( + SignalType.AccountMute -> accountMutesFetcher, + SignalType.AccountBlock -> accountBlocksFetcher), + ), + SignalType.RepeatedProfileVisit180dMinVisit6V1 -> MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit180dMinVisit6V1NoNegative -> FilteredSignalFetcherController( + MemcachedProfileVisitsFetcher, + SignalType.RepeatedProfileVisit180dMinVisit6V1NoNegative, + statsReceiver, + timer, + Map( + SignalType.AccountMute -> accountMutesFetcher, + SignalType.AccountBlock -> accountBlocksFetcher), + ), + SignalType.Reply -> replyTweetsFetcher, + SignalType.Reply90dV2 -> replyTweetsFetcher, + SignalType.Retweet -> retweetsFetcher, + SignalType.Retweet90dV2 -> retweetsFetcher, + SignalType.TweetFavorite -> tweetFavoritesFetcher, + SignalType.TweetFavorite90dV2 -> tweetFavoritesFetcher, + SignalType.TweetShareV1 -> tweetSharesFetcher, + SignalType.VideoView90dQualityV1 -> videoTweetsQualityViewFilteredFetcher, + SignalType.VideoView90dPlayback50V1 -> videoTweetsPlayback50FilteredFetcher, + /* Aggregated Signals */ + SignalType.ProducerBasedUnifiedEngagementWeightedSignal -> AggregatedSignalController( + uniformProducerSignalInfo, + uniformProducerSignalEngagementAggregation, + statsReceiver, + timer + ), + SignalType.TweetBasedUnifiedEngagementWeightedSignal -> AggregatedSignalController( + uniformTweetSignalInfo, + uniformTweetSignalEngagementAggregation, + statsReceiver, + timer + ), + SignalType.AdFavorite -> tweetFavoritesFetcher, + /* Negative Signals */ + SignalType.AccountBlock -> memcachedAccountBlocksFetcher, + SignalType.AccountMute -> memcachedAccountMutesFetcher, + SignalType.TweetDontLike -> negativeEngagedTweetFetcher, + SignalType.TweetReport -> negativeEngagedTweetFetcher, + SignalType.TweetSeeFewer -> negativeEngagedTweetFetcher, + ) + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/BUILD new file mode 100644 index 000000000..96dbbeeaf --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/BUILD @@ -0,0 +1,14 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "src/scala/com/twitter/twistly/common", + "src/scala/com/twitter/twistly/store", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/UserSignalHandler.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/UserSignalHandler.scala new file mode 100644 index 000000000..6fea51c4c --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler/UserSignalHandler.scala @@ -0,0 +1,71 @@ +package com.twitter.usersignalservice.handler + +import com.twitter.storehaus.ReadableStore +import com.twitter.usersignalservice.thriftscala.BatchSignalRequest +import com.twitter.usersignalservice.thriftscala.BatchSignalResponse +import com.twitter.util.Future +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.util.StatsUtil +import com.twitter.usersignalservice.config.SignalFetcherConfig +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.ClientIdentifier +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Duration +import com.twitter.util.Timer +import com.twitter.util.TimeoutException + +class UserSignalHandler( + signalFetcherConfig: SignalFetcherConfig, + timer: Timer, + stats: StatsReceiver) { + import UserSignalHandler._ + + val statsReceiver: StatsReceiver = stats.scope("user-signal-service/service") + + def getBatchSignalsResponse(request: BatchSignalRequest): Future[Option[BatchSignalResponse]] = { + StatsUtil.trackOptionStats(statsReceiver) { + val allSignals = request.signalRequest.map { signalRequest => + signalFetcherConfig + .SignalFetcherMapper(signalRequest.signalType) + .get(Query( + userId = request.userId, + signalType = signalRequest.signalType, + maxResults = signalRequest.maxResults.map(_.toInt), + clientId = request.clientId.getOrElse(ClientIdentifier.Unknown) + )) + .map(signalOpt => (signalRequest.signalType, signalOpt)) + } + + Future.collect(allSignals).map { signalsSeq => + val signalsMap = signalsSeq.map { + case (signalType: SignalType, signalsOpt) => + (signalType, signalsOpt.getOrElse(EmptySeq)) + }.toMap + Some(BatchSignalResponse(signalsMap)) + } + } + } + + def toReadableStore: ReadableStore[BatchSignalRequest, BatchSignalResponse] = { + new ReadableStore[BatchSignalRequest, BatchSignalResponse] { + override def get(request: BatchSignalRequest): Future[Option[BatchSignalResponse]] = { + getBatchSignalsResponse(request).raiseWithin(UserSignalServiceTimeout)(timer).rescue { + case _: TimeoutException => + statsReceiver.counter("endpointGetBatchSignals/failure/timeout").incr() + EmptyResponse + case e => + statsReceiver.counter("endpointGetBatchSignals/failure/" + e.getClass.getName).incr() + EmptyResponse + } + } + } + } +} + +object UserSignalHandler { + val UserSignalServiceTimeout: Duration = 25.milliseconds + + val EmptySeq: Seq[Nothing] = Seq.empty + val EmptyResponse: Future[Option[BatchSignalResponse]] = Future.value(Some(BatchSignalResponse())) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/BUILD new file mode 100644 index 000000000..d8e1e6a49 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/BUILD @@ -0,0 +1,25 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication", + "finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/client", + "finagle/finagle-core/src/main", + "finagle/finagle-stats", + "finagle/finagle-thrift/src/main/scala", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/predicate/socialgraph", + "relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection", + "servo/service/src/main/scala", + "src/scala/com/twitter/storehaus_internal/manhattan2", + "src/scala/com/twitter/storehaus_internal/memcache", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/twistly/common", + "src/scala/com/twitter/twistly/store", + "src/thrift/com/twitter/socialgraph:thrift-scala", + "stitch/stitch-storehaus", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "util/util-core:scala", + "util/util-stats/src/main/scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/CacheModule.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/CacheModule.scala new file mode 100644 index 000000000..38427b6ce --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/CacheModule.scala @@ -0,0 +1,34 @@ +package com.twitter.usersignalservice.module + +import com.google.inject.Provides +import javax.inject.Singleton +import com.twitter.finagle.memcached.Client +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.inject.TwitterModule +import com.twitter.conversions.DurationOps._ +import com.twitter.storehaus_internal.memcache.MemcacheStore +import com.twitter.storehaus_internal.util.ZkEndPoint +import com.twitter.storehaus_internal.util.ClientName + +object CacheModule extends TwitterModule { + private val cacheDest = + flag[String](name = "cache_module.dest", help = "Path to memcache service") + private val timeout = + flag[Int](name = "memcache.timeout", help = "Memcache client timeout") + + @Singleton + @Provides + def providesCache( + serviceIdentifier: ServiceIdentifier, + stats: StatsReceiver + ): Client = + MemcacheStore.memcachedClient( + name = ClientName("memcache_user_signal_service"), + dest = ZkEndPoint(cacheDest()), + timeout = timeout().milliseconds, + retries = 0, + statsReceiver = stats.scope("memcache"), + serviceIdentifier = serviceIdentifier + ) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/MHMtlsParamsModule.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/MHMtlsParamsModule.scala new file mode 100644 index 000000000..1ff1a7c5d --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/MHMtlsParamsModule.scala @@ -0,0 +1,18 @@ +package com.twitter.usersignalservice.module + +import com.twitter.finagle.mtls.authentication.ServiceIdentifier +import com.twitter.inject.TwitterModule +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.google.inject.Provides +import javax.inject.Singleton + +object MHMtlsParamsModule extends TwitterModule { + + @Singleton + @Provides + def providesManhattanMtlsParams( + serviceIdentifier: ServiceIdentifier + ): ManhattanKVClientMtlsParams = { + ManhattanKVClientMtlsParams(serviceIdentifier) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/SocialGraphServiceClientModule.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/SocialGraphServiceClientModule.scala new file mode 100644 index 000000000..194730261 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/SocialGraphServiceClientModule.scala @@ -0,0 +1,40 @@ +package com.twitter.usersignalservice.module + +import com.twitter.inject.Injector +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle._ +import com.twitter.finagle.mux.ClientDiscardedRequestException +import com.twitter.finagle.service.ReqRep +import com.twitter.finagle.service.ResponseClass +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.mtls.thriftmux.modules.MtlsClient +import com.twitter.inject.thrift.modules.ThriftMethodBuilderClientModule +import com.twitter.util.Duration +import com.twitter.util.Throw +import com.twitter.socialgraph.thriftscala.SocialGraphService + +object SocialGraphServiceClientModule + extends ThriftMethodBuilderClientModule[ + SocialGraphService.ServicePerEndpoint, + SocialGraphService.MethodPerEndpoint + ] + with MtlsClient { + override val label = "socialgraph" + override val dest = "/s/socialgraph/socialgraph" + override val requestTimeout: Duration = 30.milliseconds + + override def configureThriftMuxClient( + injector: Injector, + client: ThriftMux.Client + ): ThriftMux.Client = { + super + .configureThriftMuxClient(injector, client) + .withStatsReceiver(injector.instance[StatsReceiver].scope("clnt")) + .withSessionQualifier + .successRateFailureAccrual(successRate = 0.9, window = 30.seconds) + .withResponseClassifier { + case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable + } + } + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/TimerModule.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/TimerModule.scala new file mode 100644 index 000000000..ffe26f8c4 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module/TimerModule.scala @@ -0,0 +1,12 @@ +package com.twitter.usersignalservice.module +import com.google.inject.Provides +import com.twitter.finagle.util.DefaultTimer +import com.twitter.inject.TwitterModule +import com.twitter.util.Timer +import javax.inject.Singleton + +object TimerModule extends TwitterModule { + @Singleton + @Provides + def providesTimer: Timer = DefaultTimer +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/BUILD new file mode 100644 index 000000000..d1cd4e3a3 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/BUILD @@ -0,0 +1,13 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "stitch/stitch-storehaus", + "strato/src/main/scala/com/twitter/strato/fed", + "strato/src/main/scala/com/twitter/strato/fed/server", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/config", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/handler", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/module", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/UserSignalService.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/UserSignalService.scala new file mode 100644 index 000000000..92d956001 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/service/UserSignalService.scala @@ -0,0 +1,26 @@ +package com.twitter.usersignalservice +package service + +import com.google.inject.Inject +import com.google.inject.Singleton +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.stitch.storehaus.StitchOfReadableStore +import com.twitter.usersignalservice.config.SignalFetcherConfig +import com.twitter.usersignalservice.handler.UserSignalHandler +import com.twitter.usersignalservice.thriftscala.BatchSignalRequest +import com.twitter.usersignalservice.thriftscala.BatchSignalResponse +import com.twitter.util.Timer + +@Singleton +class UserSignalService @Inject() ( + signalFetcherConfig: SignalFetcherConfig, + timer: Timer, + stats: StatsReceiver) { + + private val userSignalHandler = + new UserSignalHandler(signalFetcherConfig, timer, stats) + + val userSignalServiceHandlerStoreStitch: BatchSignalRequest => com.twitter.stitch.Stitch[ + BatchSignalResponse + ] = StitchOfReadableStore(userSignalHandler.toReadableStore) +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountBlocksFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountBlocksFetcher.scala new file mode 100644 index 000000000..a72348b7b --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountBlocksFetcher.scala @@ -0,0 +1,40 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.socialgraph.thriftscala.RelationshipType +import com.twitter.socialgraph.thriftscala.SocialGraphService +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.BaseSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.signals.common.SGSUtils +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class AccountBlocksFetcher @Inject() ( + sgsClient: SocialGraphService.MethodPerEndpoint, + timer: Timer, + stats: StatsReceiver) + extends BaseSignalFetcher { + + override type RawSignalType = Signal + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(this.name) + + override def getRawSignals( + userId: UserId + ): Future[Option[Seq[RawSignalType]]] = { + SGSUtils.getSGSRawSignals(userId, sgsClient, RelationshipType.Blocking, SignalType.AccountBlock) + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map(_.map(_.take(query.maxResults.getOrElse(Int.MaxValue)))) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountFollowsFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountFollowsFetcher.scala new file mode 100644 index 000000000..60cc2bbd7 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountFollowsFetcher.scala @@ -0,0 +1,44 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.socialgraph.thriftscala.RelationshipType +import com.twitter.socialgraph.thriftscala.SocialGraphService +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.BaseSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.signals.common.SGSUtils +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class AccountFollowsFetcher @Inject() ( + sgsClient: SocialGraphService.MethodPerEndpoint, + timer: Timer, + stats: StatsReceiver) + extends BaseSignalFetcher { + + override type RawSignalType = Signal + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(this.name) + + override def getRawSignals( + userId: UserId + ): Future[Option[Seq[RawSignalType]]] = { + SGSUtils.getSGSRawSignals( + userId, + sgsClient, + RelationshipType.Following, + SignalType.AccountFollow) + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map(_.map(_.take(query.maxResults.getOrElse(Int.MaxValue)))) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountMutesFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountMutesFetcher.scala new file mode 100644 index 000000000..27eb0a36d --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/AccountMutesFetcher.scala @@ -0,0 +1,40 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.socialgraph.thriftscala.RelationshipType +import com.twitter.socialgraph.thriftscala.SocialGraphService +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.BaseSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.signals.common.SGSUtils +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class AccountMutesFetcher @Inject() ( + sgsClient: SocialGraphService.MethodPerEndpoint, + timer: Timer, + stats: StatsReceiver) + extends BaseSignalFetcher { + + override type RawSignalType = Signal + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(this.name) + + override def getRawSignals( + userId: UserId + ): Future[Option[Seq[RawSignalType]]] = { + SGSUtils.getSGSRawSignals(userId, sgsClient, RelationshipType.Muting, SignalType.AccountMute) + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map(_.map(_.take(query.maxResults.getOrElse(Int.MaxValue)))) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/BUILD new file mode 100644 index 000000000..50380a581 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/BUILD @@ -0,0 +1,34 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "3rdparty/jvm/com/twitter/bijection:scrooge", + "3rdparty/jvm/javax/inject:javax.inject", + "3rdparty/src/jvm/com/twitter/storehaus:core", + "discovery-ds/src/main/thrift/com/twitter/dds/jobs/repeated_profile_visits:profile_visit-scala", + "flock-client/src/main/thrift:thrift-scala", + "frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato", + "hermit/hermit-core/src/main/scala/com/twitter/hermit/predicate/socialgraph", + "src/scala/com/twitter/scalding_internal/job", + "src/scala/com/twitter/simclusters_v2/common", + "src/scala/com/twitter/storehaus_internal/manhattan", + "src/scala/com/twitter/storehaus_internal/manhattan/config", + "src/scala/com/twitter/storehaus_internal/manhattan2", + "src/scala/com/twitter/storehaus_internal/offline", + "src/scala/com/twitter/storehaus_internal/util", + "src/scala/com/twitter/twistly/common", + "src/thrift/com/twitter/experiments/general_metrics:general_metrics-scala", + "src/thrift/com/twitter/frigate/data_pipeline:frigate-user-history-thrift-scala", + "src/thrift/com/twitter/onboarding/relevance/tweet_engagement:tweet_engagement-scala", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/socialgraph:thrift-scala", + "src/thrift/com/twitter/traffic_attribution:traffic_attribution-scala", + "strato/src/main/scala/com/twitter/strato/client", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/base", + "user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + "util/util-core:util-core-util", + "util/util-core/src/main/java/com/twitter/util", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedTweetFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedTweetFetcher.scala new file mode 100644 index 000000000..22c0b0852 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedTweetFetcher.scala @@ -0,0 +1,97 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.thriftscala.RecentNegativeEngagedTweet +import com.twitter.twistly.thriftscala.TweetNegativeEngagementType +import com.twitter.twistly.thriftscala.UserRecentNegativeEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class NegativeEngagedTweetFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentNegativeEngagedTweets] { + + import NegativeEngagedTweetFetcher._ + + override type RawSignalType = RecentNegativeEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = stratoPath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentNegativeEngagedTweets] = + ScroogeConv.fromStruct[UserRecentNegativeEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, defaultVersion) + + override protected def toRawSignals( + stratoValue: UserRecentNegativeEngagedTweets + ): Seq[RecentNegativeEngagedTweet] = { + stratoValue.recentNegativeEngagedTweets + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RecentNegativeEngagedTweet]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + signals + .filter(signal => negativeEngagedTweetTypeFilter(query.signalType, signal)) + .map { signal => + Signal( + query.signalType, + signal.engagedAt, + Some(InternalId.TweetId(signal.tweetId)) + ) + } + .groupBy(_.targetInternalId) // groupBy if there's duplicated authorIds + .mapValues(_.maxBy(_.timestamp)) + .values + .toSeq + .sortBy(-_.timestamp) + .take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } +} + +object NegativeEngagedTweetFetcher { + + val stratoPath = "recommendations/twistly/userRecentNegativeEngagedTweets" + private val defaultVersion = 0L + + private def negativeEngagedTweetTypeFilter( + signalType: SignalType, + signal: RecentNegativeEngagedTweet + ): Boolean = { + signalType match { + case SignalType.TweetDontLike => + signal.engagementType == TweetNegativeEngagementType.DontLike + case SignalType.TweetSeeFewer => + signal.engagementType == TweetNegativeEngagementType.SeeFewer + case SignalType.TweetReport => + signal.engagementType == TweetNegativeEngagementType.ReportClick + case SignalType.NegativeEngagedTweetId => true + case _ => false + } + } + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedUserFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedUserFetcher.scala new file mode 100644 index 000000000..c07f61f91 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NegativeEngagedUserFetcher.scala @@ -0,0 +1,79 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.thriftscala.RecentNegativeEngagedTweet +import com.twitter.twistly.thriftscala.UserRecentNegativeEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class NegativeEngagedUserFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentNegativeEngagedTweets] { + + import NegativeEngagedUserFetcher._ + + override type RawSignalType = RecentNegativeEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = stratoPath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentNegativeEngagedTweets] = + ScroogeConv.fromStruct[UserRecentNegativeEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, defaultVersion) + + override protected def toRawSignals( + stratoValue: UserRecentNegativeEngagedTweets + ): Seq[RecentNegativeEngagedTweet] = { + stratoValue.recentNegativeEngagedTweets + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RecentNegativeEngagedTweet]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + signals + .map { e => + Signal( + defaultNegativeSignalType, + e.engagedAt, + Some(InternalId.UserId(e.authorId)) + ) + } + .groupBy(_.targetInternalId) // groupBy if there's duplicated authorIds + .mapValues(_.maxBy(_.timestamp)) + .values + .toSeq + .sortBy(-_.timestamp) + } + } + } +} + +object NegativeEngagedUserFetcher { + + val stratoPath = "recommendations/twistly/userRecentNegativeEngagedTweets" + private val defaultVersion = 0L + private val defaultNegativeSignalType = SignalType.NegativeEngagedUserId + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NotificationOpenAndClickFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NotificationOpenAndClickFetcher.scala new file mode 100644 index 000000000..5c40ec6a8 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/NotificationOpenAndClickFetcher.scala @@ -0,0 +1,145 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.frigate.common.store.strato.StratoFetchableStore +import com.twitter.frigate.data_pipeline.candidate_generation.thriftscala.ClientEngagementEvent +import com.twitter.frigate.data_pipeline.candidate_generation.thriftscala.LatestEvents +import com.twitter.frigate.data_pipeline.candidate_generation.thriftscala.LatestNegativeEngagementEvents +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.storehaus.ReadableStore +import com.twitter.strato.client.Client +import com.twitter.twistly.common.TweetId +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.BaseSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class NotificationOpenAndClickFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends BaseSignalFetcher { + import NotificationOpenAndClickFetcher._ + + override type RawSignalType = ClientEngagementEvent + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(this.name) + + private val latestEventsStore: ReadableStore[UserId, LatestEvents] = { + StratoFetchableStore + .withUnitView[UserId, LatestEvents](stratoClient, latestEventStoreColumn) + } + + private val notificationNegativeEngagementStore: ReadableStore[UserId, Seq[ + NotificationNegativeEngagement + ]] = { + StratoFetchableStore + .withUnitView[UserId, LatestNegativeEngagementEvents]( + stratoClient, + labeledPushRecsNegativeEngagementsColumn).mapValues(fromLatestNegativeEngagementEvents) + } + + override def getRawSignals( + userId: UserId + ): Future[Option[Seq[RawSignalType]]] = { + val notificationNegativeEngagementEventsFut = + notificationNegativeEngagementStore.get(userId) + val latestEventsFut = latestEventsStore.get(userId) + + Future + .join(latestEventsFut, notificationNegativeEngagementEventsFut).map { + case (latestEventsOpt, latestNegativeEngagementEventsOpt) => + latestEventsOpt.map { latestEvents => + // Negative Engagement Events Filter + filterNegativeEngagementEvents( + latestEvents.engagementEvents, + latestNegativeEngagementEventsOpt.getOrElse(Seq.empty), + statsReceiver.scope("filterNegativeEngagementEvents")) + } + } + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { + _.take(query.maxResults.getOrElse(Int.MaxValue)).map { clientEngagementEvent => + Signal( + SignalType.NotificationOpenAndClickV1, + timestamp = clientEngagementEvent.timestampMillis, + targetInternalId = Some(InternalId.TweetId(clientEngagementEvent.tweetId)) + ) + } + } + } + } +} + +object NotificationOpenAndClickFetcher { + private val latestEventStoreColumn = "frigate/magicrecs/labeledPushRecsAggregated.User" + private val labeledPushRecsNegativeEngagementsColumn = + "frigate/magicrecs/labeledPushRecsNegativeEngagements.User" + + case class NotificationNegativeEngagement( + tweetId: TweetId, + timestampMillis: Long, + isNtabDisliked: Boolean, + isReportTweetClicked: Boolean, + isReportTweetDone: Boolean, + isReportUserClicked: Boolean, + isReportUserDone: Boolean) + + def fromLatestNegativeEngagementEvents( + latestNegativeEngagementEvents: LatestNegativeEngagementEvents + ): Seq[NotificationNegativeEngagement] = { + latestNegativeEngagementEvents.negativeEngagementEvents.map { event => + NotificationNegativeEngagement( + event.tweetId, + event.timestampMillis, + event.isNtabDisliked.getOrElse(false), + event.isReportTweetClicked.getOrElse(false), + event.isReportTweetDone.getOrElse(false), + event.isReportUserClicked.getOrElse(false), + event.isReportUserDone.getOrElse(false) + ) + } + } + + private def filterNegativeEngagementEvents( + engagementEvents: Seq[ClientEngagementEvent], + negativeEvents: Seq[NotificationNegativeEngagement], + statsReceiver: StatsReceiver + ): Seq[ClientEngagementEvent] = { + if (negativeEvents.nonEmpty) { + statsReceiver.counter("filterNegativeEngagementEvents").incr() + statsReceiver.stat("eventSizeBeforeFilter").add(engagementEvents.size) + + val negativeEngagementIdSet = + negativeEvents.collect { + case event + if event.isNtabDisliked || event.isReportTweetClicked || event.isReportTweetDone || event.isReportUserClicked || event.isReportUserDone => + event.tweetId + }.toSet + + // negative event size + statsReceiver.stat("negativeEventsSize").add(negativeEngagementIdSet.size) + + // filter out negative engagement sources + val result = engagementEvents.filterNot { event => + negativeEngagementIdSet.contains(event.tweetId) + } + + statsReceiver.stat("eventSizeAfterFilter").add(result.size) + + result + } else engagementEvents + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/OriginalTweetsFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/OriginalTweetsFetcher.scala new file mode 100644 index 000000000..46d5b8f9c --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/OriginalTweetsFetcher.scala @@ -0,0 +1,70 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.common.TwistlyProfile +import com.twitter.twistly.thriftscala.EngagementMetadata.OriginalTweetMetadata +import com.twitter.twistly.thriftscala.RecentEngagedTweet +import com.twitter.twistly.thriftscala.UserRecentEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class OriginalTweetsFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentEngagedTweets] { + import OriginalTweetsFetcher._ + override type RawSignalType = RecentEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = + TwistlyProfile.TwistlyProdProfile.userRecentEngagedStorePath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentEngagedTweets] = + ScroogeConv.fromStruct[UserRecentEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, DefaultVersion) + + override protected def toRawSignals( + userRecentEngagedTweets: UserRecentEngagedTweets + ): Seq[RawSignalType] = + userRecentEngagedTweets.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + val lookBackWindowFilteredSignals = + SignalFilter.lookBackWindow90DayFilter(signals, query.signalType) + lookBackWindowFilteredSignals + .collect { + case RecentEngagedTweet(tweetId, engagedAt, _: OriginalTweetMetadata, _) => + Signal(query.signalType, engagedAt, Some(InternalId.TweetId(tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } + +} + +object OriginalTweetsFetcher { + // see com.twitter.twistly.store.UserRecentEngagedTweetsStore + private val DefaultVersion = 0 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileClickFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileClickFetcher.scala new file mode 100644 index 000000000..1b93df59d --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileClickFetcher.scala @@ -0,0 +1,98 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.thriftscala.RecentProfileClickImpressEvents +import com.twitter.twistly.thriftscala.ProfileClickImpressEvent +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class ProfileClickFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, RecentProfileClickImpressEvents] { + + import ProfileClickFetcher._ + + override type RawSignalType = ProfileClickImpressEvent + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = stratoPath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[RecentProfileClickImpressEvents] = + ScroogeConv.fromStruct[RecentProfileClickImpressEvents] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, defaultVersion) + + override protected def toRawSignals( + stratoValue: RecentProfileClickImpressEvents + ): Seq[ProfileClickImpressEvent] = { + stratoValue.events + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[ProfileClickImpressEvent]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { events => + events + .map { clicks => + clicks + .filter(dwelltimeFilter(_, query.signalType)) + .map(signalFromProfileClick(_, query.signalType)) + .sortBy(-_.timestamp) + .take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } +} + +object ProfileClickFetcher { + + val stratoPath = "recommendations/twistly/userRecentProfileClickImpress" + private val defaultVersion = 0L + private val sec2millis: Int => Long = i => i * 1000L + private val minDwellTimeMap: Map[SignalType, Long] = Map( + SignalType.GoodProfileClick -> sec2millis(10), + SignalType.GoodProfileClick20s -> sec2millis(20), + SignalType.GoodProfileClick30s -> sec2millis(30), + SignalType.GoodProfileClickFiltered -> sec2millis(10), + SignalType.GoodProfileClick20sFiltered -> sec2millis(20), + SignalType.GoodProfileClick30sFiltered -> sec2millis(30), + ) + + def signalFromProfileClick( + profileClickImpressEvent: ProfileClickImpressEvent, + signalType: SignalType + ): Signal = { + Signal( + signalType, + profileClickImpressEvent.engagedAt, + Some(InternalId.UserId(profileClickImpressEvent.entityId)) + ) + } + + def dwelltimeFilter( + profileClickImpressEvent: ProfileClickImpressEvent, + signalType: SignalType + ): Boolean = { + val goodClickDwellTime = minDwellTimeMap(signalType) + profileClickImpressEvent.clickImpressEventMetadata.totalDwellTime >= goodClickDwellTime + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileVisitsFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileVisitsFetcher.scala new file mode 100644 index 000000000..1cb27261f --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ProfileVisitsFetcher.scala @@ -0,0 +1,143 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.bijection.Codec +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.dds.jobs.repeated_profile_visits.thriftscala.ProfileVisitSet +import com.twitter.dds.jobs.repeated_profile_visits.thriftscala.ProfileVisitorInfo +import com.twitter.experiments.general_metrics.thriftscala.IdType +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus_internal.manhattan.Apollo +import com.twitter.storehaus_internal.manhattan.ManhattanCluster +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.ManhattanSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +case class ProfileVisitMetadata( + targetId: Option[Long], + totalTargetVisitsInLast14Days: Option[Int], + totalTargetVisitsInLast90Days: Option[Int], + totalTargetVisitsInLast180Days: Option[Int], + latestTargetVisitTimestampInLast90Days: Option[Long]) + +@Singleton +case class ProfileVisitsFetcher @Inject() ( + manhattanKVClientMtlsParams: ManhattanKVClientMtlsParams, + timer: Timer, + stats: StatsReceiver) + extends ManhattanSignalFetcher[ProfileVisitorInfo, ProfileVisitSet] { + import ProfileVisitsFetcher._ + + override type RawSignalType = ProfileVisitMetadata + + override val manhattanAppId: String = MHAppId + override val manhattanDatasetName: String = MHDatasetName + override val manhattanClusterId: ManhattanCluster = Apollo + override val manhattanKeyCodec: Codec[ProfileVisitorInfo] = BinaryScalaCodec(ProfileVisitorInfo) + override val manhattanRawSignalCodec: Codec[ProfileVisitSet] = BinaryScalaCodec(ProfileVisitSet) + + override protected def toManhattanKey(userId: UserId): ProfileVisitorInfo = + ProfileVisitorInfo(userId, IdType.User) + + override protected def toRawSignals(manhattanValue: ProfileVisitSet): Seq[ProfileVisitMetadata] = + manhattanValue.profileVisitSet + .map { + _.collect { + // only keep the Non-NSFW and not-following profile visits + case profileVisit + if profileVisit.targetId.nonEmpty + // The below check covers 180 days, not only 90 days as the name implies. + // See comment on [[ProfileVisit.latestTargetVisitTimestampInLast90Days]] thrift. + && profileVisit.latestTargetVisitTimestampInLast90Days.nonEmpty + && !profileVisit.isTargetNSFW.getOrElse(false) + && !profileVisit.doesSourceIdFollowTargetId.getOrElse(false) => + ProfileVisitMetadata( + targetId = profileVisit.targetId, + totalTargetVisitsInLast14Days = profileVisit.totalTargetVisitsInLast14Days, + totalTargetVisitsInLast90Days = profileVisit.totalTargetVisitsInLast90Days, + totalTargetVisitsInLast180Days = profileVisit.totalTargetVisitsInLast180Days, + latestTargetVisitTimestampInLast90Days = + profileVisit.latestTargetVisitTimestampInLast90Days + ) + }.toSeq + }.getOrElse(Seq.empty) + + override val name: String = this.getClass.getCanonicalName + + override val statsReceiver: StatsReceiver = stats.scope(name) + + override def process( + query: Query, + rawSignals: Future[Option[Seq[ProfileVisitMetadata]]] + ): Future[Option[Seq[Signal]]] = rawSignals.map { profiles => + profiles + .map { + _.filter(profileVisitMetadata => visitCountFilter(profileVisitMetadata, query.signalType)) + .sortBy(profileVisitMetadata => + -visitCountMap(query.signalType)(profileVisitMetadata).getOrElse(0)) + .map(profileVisitMetadata => + signalFromProfileVisit(profileVisitMetadata, query.signalType)) + .take(query.maxResults.getOrElse(Int.MaxValue)) + } + } +} + +object ProfileVisitsFetcher { + private val MHAppId = "repeated_profile_visits_aggregated" + private val MHDatasetName = "repeated_profile_visits_aggregated" + + private val minVisitCountMap: Map[SignalType, Int] = Map( + SignalType.RepeatedProfileVisit14dMinVisit2V1 -> 2, + SignalType.RepeatedProfileVisit14dMinVisit2V1NoNegative -> 2, + SignalType.RepeatedProfileVisit90dMinVisit6V1 -> 6, + SignalType.RepeatedProfileVisit90dMinVisit6V1NoNegative -> 6, + SignalType.RepeatedProfileVisit180dMinVisit6V1 -> 6, + SignalType.RepeatedProfileVisit180dMinVisit6V1NoNegative -> 6 + ) + + private val visitCountMap: Map[SignalType, ProfileVisitMetadata => Option[Int]] = Map( + SignalType.RepeatedProfileVisit14dMinVisit2V1 -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast14Days), + SignalType.RepeatedProfileVisit14dMinVisit2V1NoNegative -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast14Days), + SignalType.RepeatedProfileVisit90dMinVisit6V1 -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast90Days), + SignalType.RepeatedProfileVisit90dMinVisit6V1NoNegative -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast90Days), + SignalType.RepeatedProfileVisit180dMinVisit6V1 -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast180Days), + SignalType.RepeatedProfileVisit180dMinVisit6V1NoNegative -> + ((profileVisitMetadata: ProfileVisitMetadata) => + profileVisitMetadata.totalTargetVisitsInLast180Days) + ) + + def signalFromProfileVisit( + profileVisitMetadata: ProfileVisitMetadata, + signalType: SignalType + ): Signal = { + Signal( + signalType, + profileVisitMetadata.latestTargetVisitTimestampInLast90Days.get, + profileVisitMetadata.targetId.map(targetId => InternalId.UserId(targetId)) + ) + } + + def visitCountFilter( + profileVisitMetadata: ProfileVisitMetadata, + signalType: SignalType + ): Boolean = { + visitCountMap(signalType)(profileVisitMetadata).exists(_ >= minVisitCountMap(signalType)) + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RealGraphOonFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RealGraphOonFetcher.scala new file mode 100644 index 000000000..ad5cc4f4b --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RealGraphOonFetcher.scala @@ -0,0 +1,70 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.usersignalservice.base.Query +import com.twitter.wtf.candidate.thriftscala.CandidateSeq +import com.twitter.wtf.candidate.thriftscala.Candidate +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class RealGraphOonFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[UserId, Unit, CandidateSeq] { + import RealGraphOonFetcher._ + override type RawSignalType = Candidate + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = RealGraphOonFetcher.stratoColumnPath + override val stratoView: Unit = None + + override protected val keyConv: Conv[UserId] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[CandidateSeq] = + ScroogeConv.fromStruct[CandidateSeq] + + override protected def toStratoKey(userId: UserId): UserId = userId + + override protected def toRawSignals( + realGraphOonCandidates: CandidateSeq + ): Seq[RawSignalType] = realGraphOonCandidates.candidates + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals + .map { + _.map( + _.sortBy(-_.score) + .collect { + case c if c.score >= MinRgScore => + Signal( + SignalType.RealGraphOon, + RealGraphOonFetcher.DefaultTimestamp, + Some(InternalId.UserId(c.userId))) + }.take(query.maxResults.getOrElse(Int.MaxValue))) + } + } +} + +object RealGraphOonFetcher { + val stratoColumnPath = "recommendations/real_graph/realGraphScoresOon.User" + // quality threshold for real graph score + private val MinRgScore = 0.0 + // no timestamp for RealGraph Candidates, set default as 0L + private val DefaultTimestamp = 0L +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ReplyTweetsFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ReplyTweetsFetcher.scala new file mode 100644 index 000000000..7f84f41c9 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/ReplyTweetsFetcher.scala @@ -0,0 +1,70 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.common.TwistlyProfile +import com.twitter.twistly.thriftscala.EngagementMetadata.ReplyTweetMetadata +import com.twitter.twistly.thriftscala.RecentEngagedTweet +import com.twitter.twistly.thriftscala.UserRecentEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class ReplyTweetsFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentEngagedTweets] { + import ReplyTweetsFetcher._ + override type RawSignalType = RecentEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = + TwistlyProfile.TwistlyProdProfile.userRecentEngagedStorePath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentEngagedTweets] = + ScroogeConv.fromStruct[UserRecentEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, DefaultVersion) + + override protected def toRawSignals( + userRecentEngagedTweets: UserRecentEngagedTweets + ): Seq[RawSignalType] = + userRecentEngagedTweets.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + val lookBackWindowFilteredSignals = + SignalFilter.lookBackWindow90DayFilter(signals, query.signalType) + lookBackWindowFilteredSignals + .collect { + case RecentEngagedTweet(tweetId, engagedAt, _: ReplyTweetMetadata, _) => + Signal(query.signalType, engagedAt, Some(InternalId.TweetId(tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } + +} + +object ReplyTweetsFetcher { + // see com.twitter.twistly.store.UserRecentEngagedTweetsStore + private val DefaultVersion = 0 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RetweetsFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RetweetsFetcher.scala new file mode 100644 index 000000000..4b81c8d0b --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/RetweetsFetcher.scala @@ -0,0 +1,74 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.common.TwistlyProfile +import com.twitter.twistly.thriftscala.EngagementMetadata.RetweetMetadata +import com.twitter.twistly.thriftscala.RecentEngagedTweet +import com.twitter.twistly.thriftscala.UserRecentEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class RetweetsFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentEngagedTweets] { + import RetweetsFetcher._ + override type RawSignalType = RecentEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = + TwistlyProfile.TwistlyProdProfile.userRecentEngagedStorePath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentEngagedTweets] = + ScroogeConv.fromStruct[UserRecentEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, DefaultVersion) + + override protected def toRawSignals( + userRecentEngagedTweets: UserRecentEngagedTweets + ): Seq[RawSignalType] = + userRecentEngagedTweets.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + val lookBackWindowFilteredSignals = + SignalFilter.lookBackWindow90DayFilter(signals, query.signalType) + lookBackWindowFilteredSignals + .filter { recentEngagedTweet => + recentEngagedTweet.features.statusCounts + .flatMap(_.favoriteCount).exists(_ >= MinFavCount) + }.collect { + case RecentEngagedTweet(tweetId, engagedAt, _: RetweetMetadata, _) => + Signal(query.signalType, engagedAt, Some(InternalId.TweetId(tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } + +} + +object RetweetsFetcher { + private val MinFavCount = 10 + // see com.twitter.twistly.store.UserRecentEngagedTweetsStore + private val DefaultVersion = 0 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/SignalFilter.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/SignalFilter.scala new file mode 100644 index 000000000..01be88a26 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/SignalFilter.scala @@ -0,0 +1,48 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.twistly.thriftscala.EngagementMetadata.FavoriteMetadata +import com.twitter.twistly.thriftscala.RecentEngagedTweet +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Time + +// Shared Logic for filtering signal across different signal types +object SignalFilter { + + final val LookBackWindow90DayFilterEnabledSignalTypes: Set[SignalType] = Set( + SignalType.TweetFavorite90dV2, + SignalType.Retweet90dV2, + SignalType.OriginalTweet90dV2, + SignalType.Reply90dV2) + + /* Raw Signal Filter for TweetFavorite, Retweet, Original Tweet and Reply + * Filter out all raw signal if the most recent {Tweet Favorite + Retweet + Original Tweet + Reply} + * is older than 90 days. + * The filter is shared across 4 signal types as they are stored in the same physical store + * thus sharing the same TTL + * */ + def lookBackWindow90DayFilter( + signals: Seq[RecentEngagedTweet], + querySignalType: SignalType + ): Seq[RecentEngagedTweet] = { + if (LookBackWindow90DayFilterEnabledSignalTypes.contains( + querySignalType) && !isMostRecentSignalWithin90Days(signals.head)) { + Seq.empty + } else signals + } + + private def isMostRecentSignalWithin90Days( + signal: RecentEngagedTweet + ): Boolean = { + val diff = Time.now - Time.fromMilliseconds(signal.engagedAt) + diff.inDays <= 90 + } + + def isPromotedTweet(signal: RecentEngagedTweet): Boolean = { + signal match { + case RecentEngagedTweet(_, _, metadata: FavoriteMetadata, _) => + metadata.favoriteMetadata.isAd.getOrElse(false) + case _ => false + } + } + +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetClickFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetClickFetcher.scala new file mode 100644 index 000000000..19462a4e2 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetClickFetcher.scala @@ -0,0 +1,94 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.thriftscala.RecentTweetClickImpressEvents +import com.twitter.twistly.thriftscala.TweetClickImpressEvent +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class TweetClickFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, RecentTweetClickImpressEvents] { + + import TweetClickFetcher._ + + override type RawSignalType = TweetClickImpressEvent + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = stratoPath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[RecentTweetClickImpressEvents] = + ScroogeConv.fromStruct[RecentTweetClickImpressEvents] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, defaultVersion) + + override protected def toRawSignals( + stratoValue: RecentTweetClickImpressEvents + ): Seq[TweetClickImpressEvent] = { + stratoValue.events + } + + override def process( + query: Query, + rawSignals: Future[Option[Seq[TweetClickImpressEvent]]] + ): Future[Option[Seq[Signal]]] = + rawSignals.map { events => + events.map { clicks => + clicks + .filter(dwelltimeFilter(_, query.signalType)) + .map(signalFromTweetClick(_, query.signalType)) + .sortBy(-_.timestamp) + .take(query.maxResults.getOrElse(Int.MaxValue)) + } + } +} + +object TweetClickFetcher { + + val stratoPath = "recommendations/twistly/userRecentTweetClickImpress" + private val defaultVersion = 0L + + private val minDwellTimeMap: Map[SignalType, Long] = Map( + SignalType.GoodTweetClick -> 2 * 1000L, + SignalType.GoodTweetClick5s -> 5 * 1000L, + SignalType.GoodTweetClick10s -> 10 * 1000L, + SignalType.GoodTweetClick30s -> 30 * 1000L, + ) + + def signalFromTweetClick( + tweetClickImpressEvent: TweetClickImpressEvent, + signalType: SignalType + ): Signal = { + Signal( + signalType, + tweetClickImpressEvent.engagedAt, + Some(InternalId.TweetId(tweetClickImpressEvent.entityId)) + ) + } + + def dwelltimeFilter( + tweetClickImpressEvent: TweetClickImpressEvent, + signalType: SignalType + ): Boolean = { + val goodClickDwellTime = minDwellTimeMap(signalType) + tweetClickImpressEvent.clickImpressEventMetadata.totalDwellTime >= goodClickDwellTime + } +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetFavoritesFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetFavoritesFetcher.scala new file mode 100644 index 000000000..b427f722f --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetFavoritesFetcher.scala @@ -0,0 +1,86 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.simclusters_v2.common.UserId +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.twistly.common.TwistlyProfile +import com.twitter.twistly.thriftscala.EngagementMetadata.FavoriteMetadata +import com.twitter.twistly.thriftscala.RecentEngagedTweet +import com.twitter.twistly.thriftscala.UserRecentEngagedTweets +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.base.StratoSignalFetcher +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class TweetFavoritesFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[(UserId, Long), Unit, UserRecentEngagedTweets] { + import TweetFavoritesFetcher._ + override type RawSignalType = RecentEngagedTweet + override val name: String = this.getClass.getCanonicalName + override val statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = + TwistlyProfile.TwistlyProdProfile.userRecentEngagedStorePath + override val stratoView: Unit = None + + override protected val keyConv: Conv[(UserId, Long)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentEngagedTweets] = + ScroogeConv.fromStruct[UserRecentEngagedTweets] + + override protected def toStratoKey(userId: UserId): (UserId, Long) = (userId, DefaultVersion) + + override protected def toRawSignals( + userRecentEngagedTweets: UserRecentEngagedTweets + ): Seq[RawSignalType] = + userRecentEngagedTweets.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RawSignalType]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { signals => + val lookBackWindowFilteredSignals = + SignalFilter.lookBackWindow90DayFilter(signals, query.signalType) + lookBackWindowFilteredSignals + .filter { recentEngagedTweet => + recentEngagedTweet.features.statusCounts + .flatMap(_.favoriteCount).exists(_ >= MinFavCount) + }.filter { recentEngagedTweet => + applySignalTweetTypeFilter(query.signalType, recentEngagedTweet) + }.collect { + case RecentEngagedTweet(tweetId, engagedAt, _: FavoriteMetadata, _) => + Signal(query.signalType, engagedAt, Some(InternalId.TweetId(tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } + private def applySignalTweetTypeFilter( + signal: SignalType, + recentEngagedTweet: RecentEngagedTweet + ): Boolean = { + // Perform specific filters for particular signal types. + signal match { + case SignalType.AdFavorite => SignalFilter.isPromotedTweet(recentEngagedTweet) + case _ => true + } + } +} + +object TweetFavoritesFetcher { + private val MinFavCount = 10 + // see com.twitter.twistly.store.UserRecentEngagedTweetsStore + private val DefaultVersion = 0 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetSharesFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetSharesFetcher.scala new file mode 100644 index 000000000..6205e1bc3 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/TweetSharesFetcher.scala @@ -0,0 +1,77 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.bijection.Codec +import com.twitter.bijection.scrooge.BinaryScalaCodec +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.onboarding.relevance.tweet_engagement.thriftscala.EngagementIdentifier +import com.twitter.onboarding.relevance.tweet_engagement.thriftscala.TweetEngagement +import com.twitter.onboarding.relevance.tweet_engagement.thriftscala.TweetEngagements +import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection.Long2BigEndian +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams +import com.twitter.storehaus_internal.manhattan.Apollo +import com.twitter.storehaus_internal.manhattan.ManhattanCluster +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.base.ManhattanSignalFetcher +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Future +import com.twitter.util.Timer +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class TweetSharesFetcher @Inject() ( + manhattanKVClientMtlsParams: ManhattanKVClientMtlsParams, + timer: Timer, + stats: StatsReceiver) + extends ManhattanSignalFetcher[Long, TweetEngagements] { + + import TweetSharesFetcher._ + + override type RawSignalType = TweetEngagement + + override def name: String = this.getClass.getCanonicalName + + override def statsReceiver: StatsReceiver = stats.scope(name) + + override protected def manhattanAppId: String = MHAppId + + override protected def manhattanDatasetName: String = MHDatasetName + + override protected def manhattanClusterId: ManhattanCluster = Apollo + + override protected def manhattanKeyCodec: Codec[Long] = Long2BigEndian + + override protected def manhattanRawSignalCodec: Codec[TweetEngagements] = BinaryScalaCodec( + TweetEngagements) + + override protected def toManhattanKey(userId: UserId): Long = userId + + override protected def toRawSignals( + manhattanValue: TweetEngagements + ): Seq[TweetEngagement] = manhattanValue.tweetEngagements + + override def process( + query: Query, + rawSignals: Future[Option[Seq[TweetEngagement]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { + _.collect { + case tweetEngagement if (tweetEngagement.engagementType == EngagementIdentifier.Share) => + Signal( + SignalType.TweetShareV1, + tweetEngagement.timestampMs, + Some(InternalId.TweetId(tweetEngagement.tweetId))) + }.sortBy(-_.timestamp).take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } +} + +object TweetSharesFetcher { + private val MHAppId = "uss_prod_apollo" + private val MHDatasetName = "tweet_share_engagements" +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsPlayback50Fetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsPlayback50Fetcher.scala new file mode 100644 index 000000000..1577b2e99 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsPlayback50Fetcher.scala @@ -0,0 +1,72 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.twistly.common.UserId +import com.twitter.twistly.thriftscala.UserRecentVideoViewTweets +import com.twitter.twistly.thriftscala.VideoViewEngagementType +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.util.Timer +import com.twitter.twistly.thriftscala.RecentVideoViewTweet +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.usersignalservice.base.StratoSignalFetcher +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class VideoTweetsPlayback50Fetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[ + (UserId, VideoViewEngagementType), + Unit, + UserRecentVideoViewTweets + ] { + import VideoTweetsPlayback50Fetcher._ + + override type RawSignalType = RecentVideoViewTweet + override def name: String = this.getClass.getCanonicalName + override def statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = StratoColumn + override val stratoView: Unit = None + override protected val keyConv: Conv[(UserId, VideoViewEngagementType)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentVideoViewTweets] = + ScroogeConv.fromStruct[UserRecentVideoViewTweets] + + override protected def toStratoKey(userId: UserId): (UserId, VideoViewEngagementType) = + (userId, VideoViewEngagementType.VideoPlayback50) + + override protected def toRawSignals( + stratoValue: UserRecentVideoViewTweets + ): Seq[RecentVideoViewTweet] = stratoValue.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RecentVideoViewTweet]]] + ): Future[Option[Seq[Signal]]] = rawSignals.map { + _.map { + _.filter(videoView => + !videoView.isPromotedTweet && videoView.videoDurationSeconds >= MinVideoDurationSeconds) + .map { rawSignal => + Signal( + SignalType.VideoView90dPlayback50V1, + rawSignal.engagedAt, + Some(InternalId.TweetId(rawSignal.tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + +} + +object VideoTweetsPlayback50Fetcher { + private val StratoColumn = "recommendations/twistly/userRecentVideoViewTweetEngagements" + private val MinVideoDurationSeconds = 10 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsQualityViewFetcher.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsQualityViewFetcher.scala new file mode 100644 index 000000000..d513b978c --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/VideoTweetsQualityViewFetcher.scala @@ -0,0 +1,72 @@ +package com.twitter.usersignalservice.signals + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.twistly.common.UserId +import com.twitter.twistly.thriftscala.UserRecentVideoViewTweets +import com.twitter.twistly.thriftscala.VideoViewEngagementType +import com.twitter.usersignalservice.base.Query +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.util.Future +import com.twitter.util.Timer +import com.twitter.twistly.thriftscala.RecentVideoViewTweet +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.strato.client.Client +import com.twitter.strato.data.Conv +import com.twitter.strato.thrift.ScroogeConv +import com.twitter.usersignalservice.base.StratoSignalFetcher +import javax.inject.Inject +import javax.inject.Singleton + +@Singleton +case class VideoTweetsQualityViewFetcher @Inject() ( + stratoClient: Client, + timer: Timer, + stats: StatsReceiver) + extends StratoSignalFetcher[ + (UserId, VideoViewEngagementType), + Unit, + UserRecentVideoViewTweets + ] { + import VideoTweetsQualityViewFetcher._ + override type RawSignalType = RecentVideoViewTweet + override def name: String = this.getClass.getCanonicalName + override def statsReceiver: StatsReceiver = stats.scope(name) + + override val stratoColumnPath: String = StratoColumn + override val stratoView: Unit = None + override protected val keyConv: Conv[(UserId, VideoViewEngagementType)] = Conv.ofType + override protected val viewConv: Conv[Unit] = Conv.ofType + override protected val valueConv: Conv[UserRecentVideoViewTweets] = + ScroogeConv.fromStruct[UserRecentVideoViewTweets] + + override protected def toStratoKey(userId: UserId): (UserId, VideoViewEngagementType) = + (userId, VideoViewEngagementType.VideoQualityView) + + override protected def toRawSignals( + stratoValue: UserRecentVideoViewTweets + ): Seq[RecentVideoViewTweet] = stratoValue.recentEngagedTweets + + override def process( + query: Query, + rawSignals: Future[Option[Seq[RecentVideoViewTweet]]] + ): Future[Option[Seq[Signal]]] = { + rawSignals.map { + _.map { + _.filter(videoView => + !videoView.isPromotedTweet && videoView.videoDurationSeconds >= MinVideoDurationSeconds) + .map { rawSignal => + Signal( + SignalType.VideoView90dQualityV1, + rawSignal.engagedAt, + Some(InternalId.TweetId(rawSignal.tweetId))) + }.take(query.maxResults.getOrElse(Int.MaxValue)) + } + } + } +} + +object VideoTweetsQualityViewFetcher { + private val StratoColumn = "recommendations/twistly/userRecentVideoViewTweetEngagements" + private val MinVideoDurationSeconds = 10 +} diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/BUILD b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/BUILD new file mode 100644 index 000000000..baca538b0 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/BUILD @@ -0,0 +1,15 @@ +scala_library( + compiler_option_sets = ["fatal_warnings"], + tags = ["bazel-compatible"], + dependencies = [ + "hermit/hermit-core/src/main/scala/com/twitter/hermit/predicate/socialgraph", + "src/scala/com/twitter/simclusters_v2/common", + "src/scala/com/twitter/twistly/common", + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala", + "src/thrift/com/twitter/socialgraph:thrift-scala", + "user-signal-service/thrift/src/main/thrift:thrift-scala", + "util/util-core:util-core-util", + "util/util-core/src/main/java/com/twitter/util", + "util/util-stats/src/main/scala/com/twitter/finagle/stats", + ], +) diff --git a/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/SGSUtils.scala b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/SGSUtils.scala new file mode 100644 index 000000000..01fbd8f38 --- /dev/null +++ b/user-signal-service/server/src/main/scala/com/twitter/usersignalservice/signals/common/SGSUtils.scala @@ -0,0 +1,59 @@ +package com.twitter.usersignalservice.signals +package common + +import com.twitter.simclusters_v2.thriftscala.InternalId +import com.twitter.socialgraph.thriftscala.EdgesRequest +import com.twitter.socialgraph.thriftscala.EdgesResult +import com.twitter.socialgraph.thriftscala.PageRequest +import com.twitter.socialgraph.thriftscala.RelationshipType +import com.twitter.socialgraph.thriftscala.SocialGraphService +import com.twitter.socialgraph.thriftscala.SrcRelationship +import com.twitter.twistly.common.UserId +import com.twitter.usersignalservice.thriftscala.Signal +import com.twitter.usersignalservice.thriftscala.SignalType +import com.twitter.util.Duration +import com.twitter.util.Future +import com.twitter.util.Time + +object SGSUtils { + val MaxNumSocialGraphSignals = 200 + val MaxAge: Duration = Duration.fromDays(90) + + def getSGSRawSignals( + userId: UserId, + sgsClient: SocialGraphService.MethodPerEndpoint, + relationshipType: RelationshipType, + signalType: SignalType, + ): Future[Option[Seq[Signal]]] = { + val edgeRequest = EdgesRequest( + relationship = SrcRelationship(userId, relationshipType), + pageRequest = Some(PageRequest(count = None)) + ) + val now = Time.now.inMilliseconds + + sgsClient + .edges(Seq(edgeRequest)) + .map { sgsEdges => + sgsEdges.flatMap { + case EdgesResult(edges, _, _) => + edges.collect { + case edge if edge.createdAt >= now - MaxAge.inMilliseconds => + Signal( + signalType, + timestamp = edge.createdAt, + targetInternalId = Some(InternalId.UserId(edge.target))) + } + } + } + .map { signals => + signals + .take(MaxNumSocialGraphSignals) + .groupBy(_.targetInternalId) + .mapValues(_.maxBy(_.timestamp)) + .values + .toSeq + .sortBy(-_.timestamp) + } + .map(Some(_)) + } +} diff --git a/user-signal-service/thrift/src/main/thrift/BUILD b/user-signal-service/thrift/src/main/thrift/BUILD new file mode 100644 index 000000000..faab4af7e --- /dev/null +++ b/user-signal-service/thrift/src/main/thrift/BUILD @@ -0,0 +1,20 @@ +create_thrift_libraries( + base_name = "thrift", + sources = [ + "client_identifier.thrift", + "service.thrift", + "signal.thrift", + ], + platform = "java8", + tags = ["bazel-compatible"], + dependency_roots = [ + "src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift", + ], + generate_languages = [ + "java", + "scala", + "strato", + ], + provides_java_name = "uss-thrift-java", + provides_scala_name = "uss-thrift-scala", +) diff --git a/user-signal-service/thrift/src/main/thrift/client_identifier.thrift b/user-signal-service/thrift/src/main/thrift/client_identifier.thrift new file mode 100644 index 000000000..c953e6b8f --- /dev/null +++ b/user-signal-service/thrift/src/main/thrift/client_identifier.thrift @@ -0,0 +1,22 @@ +namespace java com.twitter.usersignalservice.thriftjava +namespace py gen.twitter.usersignalservice.service +#@namespace scala com.twitter.usersignalservice.thriftscala +#@namespace strato com.twitter.usersignalservice.strato + +# ClientIdentifier should be defined as ServiceId_Product +enum ClientIdentifier { + # reserve 1-10 for CrMixer + CrMixer_Home = 1 + CrMixer_Notifications = 2 + CrMixer_Email = 3 + # reserve 11-20 for RSX + RepresentationScorer_Home = 11 + RepresentationScorer_Notifications = 12 + + # reserve 21-30 for Explore + ExploreRanker = 21 + + # We will throw an exception after we make sure all clients are sending the + # ClientIdentifier in their request. + Unknown = 9999 +} diff --git a/user-signal-service/thrift/src/main/thrift/service.thrift b/user-signal-service/thrift/src/main/thrift/service.thrift new file mode 100644 index 000000000..a10959ea8 --- /dev/null +++ b/user-signal-service/thrift/src/main/thrift/service.thrift @@ -0,0 +1,23 @@ +namespace java com.twitter.usersignalservice.thriftjava +namespace py gen.twitter.usersignalservice.service +#@namespace scala com.twitter.usersignalservice.thriftscala +#@namespace strato com.twitter.usersignalservice.strato + +include "signal.thrift" +include "client_identifier.thrift" + +struct SignalRequest { + 1: optional i64 maxResults + 2: required signal.SignalType signalType +} + +struct BatchSignalRequest { + 1: required i64 userId(personalDataType = "UserId") + 2: required list signalRequest + # make sure to populate the clientId, otherwise the service would throw exceptions + 3: optional client_identifier.ClientIdentifier clientId +}(hasPersonalData='true') + +struct BatchSignalResponse { + 1: required map> signalResponse +} diff --git a/user-signal-service/thrift/src/main/thrift/signal.thrift b/user-signal-service/thrift/src/main/thrift/signal.thrift new file mode 100644 index 000000000..e32947be8 --- /dev/null +++ b/user-signal-service/thrift/src/main/thrift/signal.thrift @@ -0,0 +1,113 @@ +namespace java com.twitter.usersignalservice.thriftjava +namespace py gen.twitter.usersignalservice.signal +#@namespace scala com.twitter.usersignalservice.thriftscala +#@namespace strato com.twitter.usersignalservice.strato + +include "com/twitter/simclusters_v2/identifier.thrift" + + +enum SignalType { + /** + Please maintain the key space rule to avoid compatibility issue for the downstream production job + * Prod Key space: 0-1000 + * Devel Key space: 1000+ + **/ + + + /* tweet based signals */ + TweetFavorite = 0, // 540 Days Looback window + Retweet = 1, // 540 Days Lookback window + TrafficAttribution = 2, + OriginalTweet = 3, // 540 Days Looback window + Reply = 4, // 540 Days Looback window + /* Tweets that the user shared (sharer side) + * V1: successful shares (click share icon -> click in-app, or off-platform share option + * or copying link) + * */ + TweetShare_V1 = 5, // 14 Days Lookback window + + TweetFavorite_90D_V2 = 6, // 90 Days Lookback window : tweet fav from user with recent engagement in the past 90 days + Retweet_90D_V2 = 7, // 90 Days Lookback window : retweet from user with recent engagement in the past 90 days + OriginalTweet_90D_V2 = 8, // 90 Days Lookback window : original tweet from user with recent engagement in the past 90 days + Reply_90D_V2 = 9,// 90 Days Lookback window : reply from user with recent engagement in the past 90 days + GoodTweetClick = 10,// GoodTweetCilick Signal : Dwell Time Threshold >=2s + + // video tweets that were watched (10s OR 95%) in the past 90 days, are not ads, and have >=10s video + VideoView_90D_Quality_V1 = 11 // 90 Days Lookback window + // video tweets that were watched 50% in the past 90 days, are not ads, and have >=10s video + VideoView_90D_Playback50_V1 = 12 // 90 Days Lookback window + + /* user based signals */ + AccountFollow = 100, // infinite lookback window + RepeatedProfileVisit_14D_MinVisit2_V1 = 101, + RepeatedProfileVisit_90D_MinVisit6_V1 = 102, + RepeatedProfileVisit_180D_MinVisit6_V1 = 109, + RepeatedProfileVisit_14D_MinVisit2_V1_No_Negative = 110, + RepeatedProfileVisit_90D_MinVisit6_V1_No_Negative = 111, + RepeatedProfileVisit_180D_MinVisit6_V1_No_Negative = 112, + RealGraphOon = 104, + TrafficAttributionProfile_30D_LastVisit = 105, + TrafficAttributionProfile_30D_DecayedVisit = 106, + TrafficAttributionProfile_30D_WeightedEventDecayedVisit = 107, + TrafficAttributionProfile_30D_DecayedVisit_WithoutAgathaFilter = 108, + GoodProfileClick = 120, // GoodTweetCilick Signal : Dwell Time Threshold >=10s + AdFavorite = 121, // Favorites filtered to ads TweetFavorite has both organic and ads Favs + + // AccountFollowWithDelay should only be used by high-traffic clients and has 1 min delay + AccountFollowWithDelay = 122, + + + /* notifications based signals */ + /* V1: notification clicks from past 90 days with negative events (reports, dislikes) being filtered */ + NotificationOpenAndClick_V1 = 200, + + /* + negative signals for filter + */ + NegativeEngagedTweetId = 901 // tweetId for all negative engagements + NegativeEngagedUserId = 902 // userId for all negative engagements + AccountBlock = 903, + AccountMute = 904, + // skip 905 - 906 for Account report abuse / report spam + // User clicked dont like from past 90 Days + TweetDontLike = 907 + // User clicked see fewer on the recommended tweet from past 90 Days + TweetSeeFewer = 908 + // User clicked on the "report tweet" option in the tweet caret dropdown menu from past 90 days + TweetReport = 909 + + /* + devel signals + use the num > 1000 to test out signals under development/ddg + put it back to the correct corresponding Key space (0-1000) before ship + */ + GoodTweetClick_5s = 1001,// GoodTweetCilick Signal : Dwell Time Threshold >=5s + GoodTweetClick_10s = 1002,// GoodTweetCilick Signal : Dwell Time Threshold >=10s + GoodTweetClick_30s = 1003,// GoodTweetCilick Signal : Dwell Time Threshold >=30s + + GoodProfileClick_20s = 1004,// GoodProfileClick Signal : Dwell Time Threshold >=20s + GoodProfileClick_30s = 1005,// GoodProfileClick Signal : Dwell Time Threshold >=30s + + GoodProfileClick_Filtered = 1006, // GoodProfileClick Signal filtered by blocks and mutes. + GoodProfileClick_20s_Filtered = 1007// GoodProfileClick Signal : Dwell Time Threshold >=20s, filtered byblocks and mutes. + GoodProfileClick_30s_Filtered = 1008,// GoodProfileClick Signal : Dwell Time Threshold >=30s, filtered by blocks and mutes. + + /* + Unified Signals + These signals are aimed to unify multiple signal fetches into a single response. + This might be a healthier way for our retrievals layer to run inference on. + */ + TweetBasedUnifiedUniformSignal = 1300 + TweetBasedUnifiedEngagementWeightedSignal = 1301 + TweetBasedUnifiedQualityWeightedSignal = 1302 + ProducerBasedUnifiedUniformSignal = 1303 + ProducerBasedUnifiedEngagementWeightedSignal = 1304 + ProducerBasedUnifiedQualityWeightedSignal = 1305 + +} + +struct Signal { + 1: required SignalType signalType + 2: required i64 timestamp + 3: optional identifier.InternalId targetInternalId +}