Merge branch 'main' into main

This commit is contained in:
TOGEP 2023-04-15 22:26:37 +09:00 committed by GitHub
commit 9cf6bcb2ee
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
391 changed files with 32135 additions and 510 deletions

View File

@ -1,36 +1,52 @@
# Twitter Recommendation Algorithm
# Twitter's Recommendation Algorithm
The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
diagram below illustrates how major services and jobs interconnect.
Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for serving feeds of Tweets and other content across all Twitter product surfaces (e.g. For You Timeline, Search, Explore). For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm).
![](docs/system-diagram.png)
## Architecture
These are the main components of the Recommendation Algorithm included in this repository:
Product surfaces at Twitter are built on a shared set of data, models, and software frameworks. The shared components included in this repository are listed below:
| Type | Component | Description |
|------------|------------|------------|
| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict likelihood of a Twitter User interacting with another User. |
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos) |
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light ranker model used by search index (Earlybird) to rank Tweets. |
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md) |
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
| Software framework | [navi](navi/navi/README.md) | High performance, machine learning model serving written in Rust. |
| Data | [unified-user-actions](unified_user_actions/README.md) | Real-time stream of user actions on Twitter. |
| | [user-signal-service](user-signal-service/README.md) | Centralized platform to retrieve explicit (e.g. likes, replies) and implicit (e.g. profile visits, tweet clicks) user signals. |
| Model | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. |
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. |
| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. |
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
We include Bazel BUILD files for most components, but not a top level BUILD or WORKSPACE file.
The product surface currently included in this repository is the For You Timeline.
### For You Timeline
The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline.
![](docs/system-diagram.png)
The core components of the For You Timeline included in this repository are listed below:
| Type | Component | Description |
|------------|------------|------------|
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). |
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light Ranker model used by search index (Earlybird) to rank Tweets. |
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). |
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
## Build and test code
We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. We plan to add a more complete build and test system in the future.
## Contributing

View File

@ -2,6 +2,6 @@
CR-Mixer is a candidate generation service proposed as part of the Personalization Strategy vision for Twitter. Its aim is to speed up the iteration and development of candidate generation and light ranking. The service acts as a lightweight coordinating layer that delegates candidate generation tasks to underlying compute services. It focuses on Twitter's candidate generation use cases and offers a centralized platform for fetching, mixing, and managing candidate sources and light rankers. The overarching goal is to increase the speed and ease of testing and developing candidate generation pipelines, ultimately delivering more value to Twitter users.
CR-Mixer act as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
CR-Mixer acts as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.

View File

@ -6,8 +6,6 @@ import com.twitter.search.earlybird.thriftscala.EarlybirdService
import com.twitter.search.earlybird.thriftscala.ThriftSearchQuery
import com.twitter.util.Time
import com.twitter.search.common.query.thriftjava.thriftscala.CollectorParams
import com.twitter.search.common.ranking.thriftscala.ThriftAgeDecayRankingParams
import com.twitter.search.common.ranking.thriftscala.ThriftLinearFeatureRankingParams
import com.twitter.search.common.ranking.thriftscala.ThriftRankingParams
import com.twitter.search.common.ranking.thriftscala.ThriftScoringFunctionType
import com.twitter.search.earlybird.thriftscala.ThriftSearchRelevanceOptions
@ -97,7 +95,7 @@ object EarlybirdTensorflowBasedSimilarityEngine {
// Whether to collect conversation IDs. Remove it for now.
// collectConversationId = Gate.True(), // true for Home
rankingMode = ThriftSearchRankingMode.Relevance,
relevanceOptions = Some(getRelevanceOptions(query.useTensorflowRanking)),
relevanceOptions = Some(getRelevanceOptions),
collectorParams = Some(
CollectorParams(
// numResultsToReturn defines how many results each EB shard will return to search root
@ -116,13 +114,11 @@ object EarlybirdTensorflowBasedSimilarityEngine {
// The specific values of recap relevance/reranking options correspond to
// experiment: enable_recap_reranking_2988,timeline_internal_disable_recap_filter
// bucket : enable_rerank,disable_filter
private def getRelevanceOptions(useTensorflowRanking: Boolean): ThriftSearchRelevanceOptions = {
private def getRelevanceOptions: ThriftSearchRelevanceOptions = {
ThriftSearchRelevanceOptions(
proximityScoring = true,
maxConsecutiveSameUser = Some(2),
rankingParams =
if (useTensorflowRanking) Some(getTensorflowBasedRankingParams)
else Some(getLinearRankingParams),
rankingParams = Some(getTensorflowBasedRankingParams),
maxHitsToProcess = Some(500),
maxUserBlendCount = Some(3),
proximityPhraseWeight = 9.0,
@ -131,41 +127,12 @@ object EarlybirdTensorflowBasedSimilarityEngine {
}
private def getTensorflowBasedRankingParams: ThriftRankingParams = {
getLinearRankingParams.copy(
ThriftRankingParams(
`type` = Some(ThriftScoringFunctionType.TensorflowBased),
selectedTensorflowModel = Some("timelines_rectweet_replica"),
minScore = -1.0e100,
applyBoosts = false,
authorSpecificScoreAdjustments = None
)
}
private def getLinearRankingParams: ThriftRankingParams = {
ThriftRankingParams(
`type` = Some(ThriftScoringFunctionType.Linear),
minScore = -1.0e100,
retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)),
replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)),
luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)),
urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)),
langEnglishUIBoost = 0.5,
langEnglishTweetBoost = 0.2,
langDefaultBoost = 0.02,
unknownLanguageBoost = 0.05,
offensiveBoost = 0.1,
inTrustedCircleBoost = 3.0,
multipleHashtagsOrTrendsBoost = 0.6,
inDirectFollowBoost = 4.0,
tweetHasTrendBoost = 1.1,
selfTweetBoost = 2.0,
tweetHasImageUrlBoost = 2.0,
tweetHasVideoUrlBoost = 2.0,
useUserLanguageInfo = true,
ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0))
)
}
}

View File

@ -160,7 +160,7 @@ object HomeTweetTypePredicates {
("has_gte_1k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
(
"has_gte_10k_favs",
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 10000))),
(
"has_gte_100k_favs",
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 100000))),

View File

@ -15,28 +15,6 @@ object RelevanceSearchUtil {
`type` = Some(scr.ThriftScoringFunctionType.TensorflowBased),
selectedTensorflowModel = Some("timelines_rectweet_replica"),
minScore = -1.0e100,
retweetCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 20.0)),
replyCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 1.0)),
reputationParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 0.2)),
luceneScoreParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 2.0)),
textScoreParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 0.18)),
urlParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 2.0)),
isReplyParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 1.0)),
favCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 30.0)),
langEnglishUIBoost = 0.5,
langEnglishTweetBoost = 0.2,
langDefaultBoost = 0.02,
unknownLanguageBoost = 0.05,
offensiveBoost = 0.1,
inTrustedCircleBoost = 3.0,
multipleHashtagsOrTrendsBoost = 0.6,
inDirectFollowBoost = 4.0,
tweetHasTrendBoost = 1.1,
selfTweetBoost = 2.0,
tweetHasImageUrlBoost = 2.0,
tweetHasVideoUrlBoost = 2.0,
useUserLanguageInfo = true,
ageDecayParams = Some(scr.ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0)),
selectedModels = Some(Map("home_mixer_unified_engagement_prod" -> 1.0)),
applyBoosts = false,
)

View File

@ -1,6 +1,6 @@
# Navi: High-Performance Machine Learning Serving Server in Rust
Navi is a high-performance, versatile machine learning serving server implemented in Rust, tailored for production usage. It's designed to efficiently serve within the Twitter tech stack, offering top-notch performance while focusing on core features.
Navi is a high-performance, versatile machine learning serving server implemented in Rust and tailored for production usage. It's designed to efficiently serve within the Twitter tech stack, offering top-notch performance while focusing on core features.
## Key Features
@ -23,12 +23,14 @@ While Navi's features may not be as comprehensive as its open-source counterpart
- `thrift_bpr_adapter`: generated thrift code for BatchPredictionRequest
## Content
We include all *.rs source code that makes up the main navi binaries for you to examine. The test and benchmark code, as well as configuration files are not included due to data security concerns.
We have included all *.rs source code files that make up the main Navi binaries for you to examine. However, we have not included the test and benchmark code, or various configuration files, due to data security concerns.
## Run
in navi/navi you can run. Note you need to create a models directory and create some versions, preferably using epoch time, e.g., 1679693908377
- scripts/run_tf2.sh
- scripts/run_onnx.sh
In navi/navi, you can run the following commands:
- `scripts/run_tf2.sh` for [TensorFlow](https://www.tensorflow.org/)
- `scripts/run_onnx.sh` for [Onnx](https://onnx.ai/)
Do note that you need to create a models directory and create some versions, preferably using epoch time, e.g., `1679693908377`.
## Build
you can adapt the above scripts to build using Cargo
You can adapt the above scripts to build using Cargo.

View File

@ -44,6 +44,5 @@ pub struct RenamedFeatures {
}
pub fn parse(json_str: &str) -> Result<AllConfig, Error> {
let all_config: AllConfig = serde_json::from_str(json_str)?;
return std::result::Result::Ok(all_config);
serde_json::from_str(json_str)
}

View File

@ -16,8 +16,7 @@ use segdense::util;
use thrift::protocol::{TBinaryInputProtocol, TSerializable};
use thrift::transport::TBufferChannel;
use crate::{all_config};
use crate::all_config::AllConfig;
use crate::{all_config, all_config::AllConfig};
pub fn log_feature_match(
dr: &DataRecord,
@ -27,26 +26,22 @@ pub fn log_feature_match(
// Note the following algorithm matches features from config using linear search.
// Also the record source is MinDataRecord. This includes only binary and continuous features for now.
for (feature_id, feature_value) in dr.continuous_features.as_ref().unwrap().into_iter() {
for (feature_id, feature_value) in dr.continuous_features.as_ref().unwrap() {
debug!(
"{} - Continuous Datarecord => Feature ID: {}, Feature value: {}",
dr_type, feature_id, feature_value
"{dr_type} - Continuous Datarecord => Feature ID: {feature_id}, Feature value: {feature_value}"
);
for input_feature in &seg_dense_config.cont.input_features {
if input_feature.feature_id == *feature_id {
debug!("Matching input feature: {:?}", input_feature)
debug!("Matching input feature: {input_feature:?}")
}
}
}
for feature_id in dr.binary_features.as_ref().unwrap().into_iter() {
debug!(
"{} - Binary Datarecord => Feature ID: {}",
dr_type, feature_id
);
for feature_id in dr.binary_features.as_ref().unwrap() {
debug!("{dr_type} - Binary Datarecord => Feature ID: {feature_id}");
for input_feature in &seg_dense_config.binary.input_features {
if input_feature.feature_id == *feature_id {
debug!("Found input feature: {:?}", input_feature)
debug!("Found input feature: {input_feature:?}")
}
}
}
@ -96,15 +91,13 @@ impl BatchPredictionRequestToTorchTensorConverter {
reporting_feature_ids: Vec<(i64, &str)>,
register_metric_fn: Option<impl Fn(&HistogramVec)>,
) -> BatchPredictionRequestToTorchTensorConverter {
let all_config_path = format!("{}/{}/all_config.json", model_dir, model_version);
let seg_dense_config_path = format!(
"{}/{}/segdense_transform_spec_home_recap_2022.json",
model_dir, model_version
);
let all_config_path = format!("{model_dir}/{model_version}/all_config.json");
let seg_dense_config_path =
format!("{model_dir}/{model_version}/segdense_transform_spec_home_recap_2022.json");
let seg_dense_config = util::load_config(&seg_dense_config_path);
let all_config = all_config::parse(
&fs::read_to_string(&all_config_path)
.unwrap_or_else(|error| panic!("error loading all_config.json - {}", error)),
.unwrap_or_else(|error| panic!("error loading all_config.json - {error}")),
)
.unwrap();
@ -138,11 +131,11 @@ impl BatchPredictionRequestToTorchTensorConverter {
let (discrete_feature_metrics, continuous_feature_metrics) = METRICS.get_or_init(|| {
let discrete = HistogramVec::new(
HistogramOpts::new(":navi:feature_id:discrete", "Discrete Feature ID values")
.buckets(Vec::from(&[
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
.buckets(Vec::from([
0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0,
300.0, 500.0, 1000.0, 10000.0, 100000.0,
] as &'static [f64])),
])),
&["feature_id"],
)
.expect("metric cannot be created");
@ -151,18 +144,18 @@ impl BatchPredictionRequestToTorchTensorConverter {
":navi:feature_id:continuous",
"continuous Feature ID values",
)
.buckets(Vec::from(&[
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0,
130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0, 500.0,
1000.0, 10000.0, 100000.0,
] as &'static [f64])),
.buckets(Vec::from([
0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0,
500.0, 1000.0, 10000.0, 100000.0,
])),
&["feature_id"],
)
.expect("metric cannot be created");
register_metric_fn.map(|r| {
if let Some(r) = register_metric_fn {
r(&discrete);
r(&continuous);
});
}
(discrete, continuous)
});
@ -171,16 +164,13 @@ impl BatchPredictionRequestToTorchTensorConverter {
for (feature_id, feature_type) in reporting_feature_ids.iter() {
match *feature_type {
"discrete" => discrete_features_to_report.insert(feature_id.clone()),
"continuous" => continuous_features_to_report.insert(feature_id.clone()),
_ => panic!(
"Invalid feature type {} for reporting metrics!",
feature_type
),
"discrete" => discrete_features_to_report.insert(*feature_id),
"continuous" => continuous_features_to_report.insert(*feature_id),
_ => panic!("Invalid feature type {feature_type} for reporting metrics!"),
};
}
return BatchPredictionRequestToTorchTensorConverter {
BatchPredictionRequestToTorchTensorConverter {
all_config,
seg_dense_config,
all_config_path,
@ -193,7 +183,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
continuous_features_to_report,
discrete_feature_metrics,
continuous_feature_metrics,
};
}
}
fn get_feature_id(feature_name: &str, seg_dense_config: &Root) -> i64 {
@ -203,7 +193,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
return feature.feature_id;
}
}
return -1;
-1
}
fn parse_batch_prediction_request(bytes: Vec<u8>) -> BatchPredictionRequest {
@ -211,7 +201,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
let mut bc = TBufferChannel::with_capacity(bytes.len(), 0);
bc.set_readable_bytes(&bytes);
let mut protocol = TBinaryInputProtocol::new(bc, true);
return BatchPredictionRequest::read_from_in_protocol(&mut protocol).unwrap();
BatchPredictionRequest::read_from_in_protocol(&mut protocol).unwrap()
}
fn get_embedding_tensors(
@ -228,45 +218,43 @@ impl BatchPredictionRequestToTorchTensorConverter {
let mut working_set = vec![0 as f32; total_size];
let mut bpr_start = 0;
for (bpr, &bpr_end) in bprs.iter().zip(batch_size) {
if bpr.common_features.is_some() {
if bpr.common_features.as_ref().unwrap().tensors.is_some() {
if bpr
.common_features
.as_ref()
.unwrap()
.tensors
.as_ref()
.unwrap()
.contains_key(&feature_id)
if bpr.common_features.is_some()
&& bpr.common_features.as_ref().unwrap().tensors.is_some()
&& bpr
.common_features
.as_ref()
.unwrap()
.tensors
.as_ref()
.unwrap()
.contains_key(&feature_id)
{
let source_tensor = bpr
.common_features
.as_ref()
.unwrap()
.tensors
.as_ref()
.unwrap()
.get(&feature_id)
.unwrap();
let tensor = match source_tensor {
GeneralTensor::FloatTensor(float_tensor) =>
//Tensor::of_slice(
{
let source_tensor = bpr
.common_features
.as_ref()
.unwrap()
.tensors
.as_ref()
.unwrap()
.get(&feature_id)
.unwrap();
let tensor = match source_tensor {
GeneralTensor::FloatTensor(float_tensor) =>
//Tensor::of_slice(
{
float_tensor
.floats
.iter()
.map(|x| x.into_inner() as f32)
.collect::<Vec<_>>()
}
_ => vec![0 as f32; cols],
};
float_tensor
.floats
.iter()
.map(|x| x.into_inner() as f32)
.collect::<Vec<_>>()
}
_ => vec![0 as f32; cols],
};
// since the tensor is found in common feature, add it in all batches
for row in bpr_start..bpr_end {
for col in 0..cols {
working_set[row * cols + col] = tensor[col];
}
}
// since the tensor is found in common feature, add it in all batches
for row in bpr_start..bpr_end {
for col in 0..cols {
working_set[row * cols + col] = tensor[col];
}
}
}
@ -300,7 +288,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
}
bpr_start = bpr_end;
}
return Array2::<f32>::from_shape_vec([rows, cols], working_set).unwrap();
Array2::<f32>::from_shape_vec([rows, cols], working_set).unwrap()
}
// Todo : Refactor, create a generic version with different type and field accessors
@ -310,9 +298,9 @@ impl BatchPredictionRequestToTorchTensorConverter {
// (INT64 --> INT64, DataRecord.discrete_feature)
fn get_continuous(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
// These need to be part of model schema
let rows: usize = batch_ends[batch_ends.len() - 1];
let cols: usize = 5293;
let full_size: usize = (rows * cols).try_into().unwrap();
let rows = batch_ends[batch_ends.len() - 1];
let cols = 5293;
let full_size = rows * cols;
let default_val = f32::NAN;
let mut tensor = vec![default_val; full_size];
@ -337,55 +325,48 @@ impl BatchPredictionRequestToTorchTensorConverter {
.unwrap();
for feature in common_features {
match self.feature_mapper.get(feature.0) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index: usize = (r * cols + idx).try_into().unwrap();
tensor[flat_index] = feature.1.into_inner() as f32;
}
if let Some(f_info) = self.feature_mapper.get(feature.0) {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index = r * cols + idx;
tensor[flat_index] = feature.1.into_inner() as f32;
}
}
None => (),
}
if self.continuous_features_to_report.contains(feature.0) {
self.continuous_feature_metrics
.with_label_values(&[feature.0.to_string().as_str()])
.observe(feature.1.into_inner() as f64)
.observe(feature.1.into_inner())
} else if self.discrete_features_to_report.contains(feature.0) {
self.discrete_feature_metrics
.with_label_values(&[feature.0.to_string().as_str()])
.observe(feature.1.into_inner() as f64)
.observe(feature.1.into_inner())
}
}
}
// Process the batch of datarecords
for r in bpr_start..bpr_end {
let dr: &DataRecord =
&bpr.individual_features_list[usize::try_from(r - bpr_start).unwrap()];
let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start];
if dr.continuous_features.is_some() {
for feature in dr.continuous_features.as_ref().unwrap() {
match self.feature_mapper.get(&feature.0) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
let flat_index: usize = (r * cols + idx).try_into().unwrap();
if flat_index < tensor.len() && idx < cols {
tensor[flat_index] = feature.1.into_inner() as f32;
}
if let Some(f_info) = self.feature_mapper.get(feature.0) {
let idx = f_info.index_within_tensor as usize;
let flat_index = r * cols + idx;
if flat_index < tensor.len() && idx < cols {
tensor[flat_index] = feature.1.into_inner() as f32;
}
None => (),
}
if self.continuous_features_to_report.contains(feature.0) {
self.continuous_feature_metrics
.with_label_values(&[feature.0.to_string().as_str()])
.observe(feature.1.into_inner() as f64)
.observe(feature.1.into_inner())
} else if self.discrete_features_to_report.contains(feature.0) {
self.discrete_feature_metrics
.with_label_values(&[feature.0.to_string().as_str()])
.observe(feature.1.into_inner() as f64)
.observe(feature.1.into_inner())
}
}
}
@ -393,22 +374,19 @@ impl BatchPredictionRequestToTorchTensorConverter {
bpr_start = bpr_end;
}
return InputTensor::FloatTensor(
Array2::<f32>::from_shape_vec(
[rows.try_into().unwrap(), cols.try_into().unwrap()],
tensor,
)
.unwrap()
.into_dyn(),
);
InputTensor::FloatTensor(
Array2::<f32>::from_shape_vec([rows, cols], tensor)
.unwrap()
.into_dyn(),
)
}
fn get_binary(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
// These need to be part of model schema
let rows: usize = batch_ends[batch_ends.len() - 1];
let cols: usize = 149;
let full_size: usize = (rows * cols).try_into().unwrap();
let default_val: i64 = 0;
let rows = batch_ends[batch_ends.len() - 1];
let cols = 149;
let full_size = rows * cols;
let default_val = 0;
let mut v = vec![default_val; full_size];
@ -432,55 +410,48 @@ impl BatchPredictionRequestToTorchTensorConverter {
.unwrap();
for feature in common_features {
match self.feature_mapper.get(feature) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index: usize = (r * cols + idx).try_into().unwrap();
v[flat_index] = 1;
}
if let Some(f_info) = self.feature_mapper.get(feature) {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index = r * cols + idx;
v[flat_index] = 1;
}
}
None => (),
}
}
}
// Process the batch of datarecords
for r in bpr_start..bpr_end {
let dr: &DataRecord =
&bpr.individual_features_list[usize::try_from(r - bpr_start).unwrap()];
let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start];
if dr.binary_features.is_some() {
for feature in dr.binary_features.as_ref().unwrap() {
match self.feature_mapper.get(&feature) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
let flat_index: usize = (r * cols + idx).try_into().unwrap();
v[flat_index] = 1;
}
None => (),
if let Some(f_info) = self.feature_mapper.get(feature) {
let idx = f_info.index_within_tensor as usize;
let flat_index = r * cols + idx;
v[flat_index] = 1;
}
}
}
}
bpr_start = bpr_end;
}
return InputTensor::Int64Tensor(
Array2::<i64>::from_shape_vec([rows.try_into().unwrap(), cols.try_into().unwrap()], v)
InputTensor::Int64Tensor(
Array2::<i64>::from_shape_vec([rows, cols], v)
.unwrap()
.into_dyn(),
);
)
}
#[allow(dead_code)]
fn get_discrete(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
// These need to be part of model schema
let rows: usize = batch_ends[batch_ends.len() - 1];
let cols: usize = 320;
let full_size: usize = (rows * cols).try_into().unwrap();
let default_val: i64 = 0;
let rows = batch_ends[batch_ends.len() - 1];
let cols = 320;
let full_size = rows * cols;
let default_val = 0;
let mut v = vec![default_val; full_size];
@ -504,18 +475,15 @@ impl BatchPredictionRequestToTorchTensorConverter {
.unwrap();
for feature in common_features {
match self.feature_mapper.get(feature.0) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index: usize = (r * cols + idx).try_into().unwrap();
v[flat_index] = *feature.1;
}
if let Some(f_info) = self.feature_mapper.get(feature.0) {
let idx = f_info.index_within_tensor as usize;
if idx < cols {
// Set value in each row
for r in bpr_start..bpr_end {
let flat_index = r * cols + idx;
v[flat_index] = *feature.1;
}
}
None => (),
}
if self.discrete_features_to_report.contains(feature.0) {
self.discrete_feature_metrics
@ -527,18 +495,15 @@ impl BatchPredictionRequestToTorchTensorConverter {
// Process the batch of datarecords
for r in bpr_start..bpr_end {
let dr: &DataRecord = &bpr.individual_features_list[usize::try_from(r).unwrap()];
let dr: &DataRecord = &bpr.individual_features_list[r];
if dr.discrete_features.is_some() {
for feature in dr.discrete_features.as_ref().unwrap() {
match self.feature_mapper.get(&feature.0) {
Some(f_info) => {
let idx = f_info.index_within_tensor as usize;
let flat_index: usize = (r * cols + idx).try_into().unwrap();
if flat_index < v.len() && idx < cols {
v[flat_index] = *feature.1;
}
if let Some(f_info) = self.feature_mapper.get(feature.0) {
let idx = f_info.index_within_tensor as usize;
let flat_index = r * cols + idx;
if flat_index < v.len() && idx < cols {
v[flat_index] = *feature.1;
}
None => (),
}
if self.discrete_features_to_report.contains(feature.0) {
self.discrete_feature_metrics
@ -550,11 +515,11 @@ impl BatchPredictionRequestToTorchTensorConverter {
}
bpr_start = bpr_end;
}
return InputTensor::Int64Tensor(
Array2::<i64>::from_shape_vec([rows.try_into().unwrap(), cols.try_into().unwrap()], v)
InputTensor::Int64Tensor(
Array2::<i64>::from_shape_vec([rows, cols], v)
.unwrap()
.into_dyn(),
);
)
}
fn get_user_embedding(
@ -604,7 +569,7 @@ impl Converter for BatchPredictionRequestToTorchTensorConverter {
.map(|bpr| bpr.individual_features_list.len())
.scan(0usize, |acc, e| {
//running total
*acc = *acc + e;
*acc += e;
Some(*acc)
})
.collect::<Vec<_>>();

View File

@ -9,15 +9,17 @@ use std::{
pub fn load_batch_prediction_request_base64(file_name: &str) -> Vec<Vec<u8>> {
let file = File::open(file_name).expect("could not read file");
let mut result = vec![];
for line in io::BufReader::new(file).lines() {
for (mut line_count, line) in io::BufReader::new(file).lines().enumerate() {
line_count += 1;
match base64::decode(line.unwrap().trim()) {
Ok(payload) => result.push(payload),
Err(err) => println!("error decoding line {}", err),
Err(err) => println!("error decoding line {file_name}:{line_count} - {err}"),
}
}
println!("reslt len: {}", result.len());
return result;
println!("result len: {}", result.len());
result
}
pub fn save_to_npy<T: npyz::Serialize + AutoSerialize>(data: &[T], save_to: String) {
let mut writer = WriteOptions::new()
.default_dtype()

View File

@ -1,13 +1,10 @@
# recos-injector
Recos-Injector is a streaming event processor for building input streams for GraphJet based services.
It is general purpose in that it consumes arbitrary incoming event stream (e.x. Fav, RT, Follow, client_events, etc), applies
filtering, combines and publishes cleaned up events to corresponding GraphJet services.
Each GraphJet based service subscribes to a dedicated Kafka topic. Recos-Injector enables a GraphJet based service to consume any
event it wants
# Recos-Injector
## How to run recos-injector-server tests
Recos-Injector is a streaming event processor used to build input streams for GraphJet-based services. It is a general-purpose tool that consumes arbitrary incoming event streams (e.g., Fav, RT, Follow, client_events, etc.), applies filtering, and combines and publishes cleaned up events to corresponding GraphJet services. Each GraphJet-based service subscribes to a dedicated Kafka topic, and Recos-Injector enables GraphJet-based services to consume any event they want.
Tests can be run by using this command from your project's root directory:
## How to run Recos-Injector server tests
You can run tests by using the following command from your project's root directory:
$ bazel build recos-injector/...
$ bazel test recos-injector/...
@ -28,17 +25,16 @@ terminal:
$ curl -s localhost:9990/admin/ping
pong
Run `curl -s localhost:9990/admin` to see a list of all of the available admin
endpoints.
Run `curl -s localhost:9990/admin` to see a list of all available admin endpoints.
## Querying recos-injector-server from a Scala console
## Querying Recos-Injector server from a Scala console
Recos Injector does not have a thrift endpoint. It reads Event Bus and Kafka queues and writes to recos_injector kafka.
Recos-Injector does not have a Thrift endpoint. Instead, it reads Event Bus and Kafka queues and writes to the Recos-Injector Kafka.
## Generating a package for deployment
To package your service into a zip for deployment:
To package your service into a zip file for deployment, run:
$ bazel bundle recos-injector/server:bin --bundle-jvm-archive=zip
If successful, a file `dist/recos-injector-server.zip` will be created.
If the command is successful, a file named `dist/recos-injector-server.zip` will be created.

View File

@ -15,7 +15,7 @@ SimClusters from the Linear Algebra Perspective discussed the difference between
However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, its hard to support by the existing infrastructure.
## SimClusters Approximate Cosine Similariy Core Algorithm
## SimClusters Approximate Cosine Similarity Core Algorithm
1. Provide a source SimCluster Embedding *SV*, *SV = [(SC1, Score), (SC2, Score), (SC3, Score) …]*

View File

@ -513,12 +513,12 @@ public class BasicIndexingConverter {
Optional<Long> inReplyToUserId = Optional.of(inReplyToUserIdVal).filter(x -> x > 0);
Optional<Long> inReplyToStatusId = Optional.of(inReplyToStatusIdVal).filter(x -> x > 0);
// We have six combinations here. A tweet can be
// We have six combinations here. A Tweet can be
// 1) a reply to another tweet (then it has both in-reply-to-user-id and
// in-reply-to-status-id set),
// 2) directed-at a user (then it only has in-reply-to-user-id set),
// 3) not a reply at all.
// Additionally, it may or may not be a retweet (if it is, then it has retweet-user-id and
// Additionally, it may or may not be a Retweet (if it is, then it has retweet-user-id and
// retweet-status-id set).
//
// We want to set some fields unconditionally, and some fields (reference-author-id and

View File

@ -22,13 +22,13 @@ import static com.twitter.search.modeling.tweet_ranking.TweetScoringFeatures.Fea
/**
* Loads the scoring models for tweets and provides access to them.
*
* This class relies on a list ModelLoader objects to retrieve the objects from them. It will
* This class relies on a list of ModelLoader objects to retrieve the objects from them. It will
* return the first model found according to the order in the list.
*
* For production, we load models from 2 sources: classpath and HDFS. If a model is available
* from HDFS, we return it, otherwise we use the model from the classpath.
*
* The models used in for default requests (i.e. not experiments) MUST be present in the
* The models used for default requests (i.e. not experiments) MUST be present in the
* classpath, this allows us to avoid errors if they can't be loaded from HDFS.
* Models for experiments can live only in HDFS, so we don't need to redeploy Earlybird if we
* want to test them.

View File

@ -3,76 +3,81 @@ from twml.feature_config import FeatureConfigBuilder
def get_feature_config(data_spec_path, label):
return FeatureConfigBuilder(data_spec_path=data_spec_path, debug=True) \
return (
FeatureConfigBuilder(data_spec_path=data_spec_path, debug=True)
.batch_add_features(
[
("ebd.author_specific_score", "A"),
("ebd.has_diff_lang", "A"),
("ebd.has_english_tweet_diff_ui_lang", "A"),
("ebd.has_english_ui_diff_tweet_lang", "A"),
("ebd.is_self_tweet", "A"),
("ebd.tweet_age_in_secs", "A"),
("encoded_tweet_features.favorite_count", "A"),
("encoded_tweet_features.from_verified_account_flag", "A"),
("encoded_tweet_features.has_card_flag", "A"),
# ("encoded_tweet_features.has_consumer_video_flag", "A"),
("encoded_tweet_features.has_image_url_flag", "A"),
("encoded_tweet_features.has_link_flag", "A"),
("encoded_tweet_features.has_multiple_hashtags_or_trends_flag", "A"),
# ("encoded_tweet_features.has_multiple_media_flag", "A"),
("encoded_tweet_features.has_native_image_flag", "A"),
("encoded_tweet_features.has_news_url_flag", "A"),
("encoded_tweet_features.has_periscope_flag", "A"),
("encoded_tweet_features.has_pro_video_flag", "A"),
("encoded_tweet_features.has_quote_flag", "A"),
("encoded_tweet_features.has_trend_flag", "A"),
("encoded_tweet_features.has_video_url_flag", "A"),
("encoded_tweet_features.has_vine_flag", "A"),
("encoded_tweet_features.has_visible_link_flag", "A"),
("encoded_tweet_features.is_offensive_flag", "A"),
("encoded_tweet_features.is_reply_flag", "A"),
("encoded_tweet_features.is_retweet_flag", "A"),
("encoded_tweet_features.is_sensitive_content", "A"),
# ("encoded_tweet_features.is_user_new_flag", "A"),
("encoded_tweet_features.language", "A"),
("encoded_tweet_features.link_language", "A"),
("encoded_tweet_features.num_hashtags", "A"),
("encoded_tweet_features.num_mentions", "A"),
# ("encoded_tweet_features.profile_is_egg_flag", "A"),
("encoded_tweet_features.reply_count", "A"),
("encoded_tweet_features.retweet_count", "A"),
("encoded_tweet_features.text_score", "A"),
("encoded_tweet_features.user_reputation", "A"),
("extended_encoded_tweet_features.embeds_impression_count", "A"),
("extended_encoded_tweet_features.embeds_impression_count_v2", "A"),
("extended_encoded_tweet_features.embeds_url_count", "A"),
("extended_encoded_tweet_features.embeds_url_count_v2", "A"),
("extended_encoded_tweet_features.favorite_count_v2", "A"),
("extended_encoded_tweet_features.label_abusive_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.label_dup_content_flag", "A"),
("extended_encoded_tweet_features.label_nsfw_hi_prc_flag", "A"),
("extended_encoded_tweet_features.label_nsfw_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.label_spam_flag", "A"),
("extended_encoded_tweet_features.label_spam_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.quote_count", "A"),
("extended_encoded_tweet_features.reply_count_v2", "A"),
("extended_encoded_tweet_features.retweet_count_v2", "A"),
("extended_encoded_tweet_features.weighted_favorite_count", "A"),
("extended_encoded_tweet_features.weighted_quote_count", "A"),
("extended_encoded_tweet_features.weighted_reply_count", "A"),
("extended_encoded_tweet_features.weighted_retweet_count", "A"),
]
).add_labels([
label, # Tensor index: 0
"recap.engagement.is_clicked", # Tensor index: 1
"recap.engagement.is_favorited", # Tensor index: 2
"recap.engagement.is_open_linked", # Tensor index: 3
"recap.engagement.is_photo_expanded", # Tensor index: 4
"recap.engagement.is_profile_clicked", # Tensor index: 5
"recap.engagement.is_replied", # Tensor index: 6
"recap.engagement.is_retweeted", # Tensor index: 7
"recap.engagement.is_video_playback_50", # Tensor index: 8
"timelines.earlybird_score", # Tensor index: 9
]) \
.define_weight("meta.record_weight/type=earlybird") \
[
("ebd.author_specific_score", "A"),
("ebd.has_diff_lang", "A"),
("ebd.has_english_tweet_diff_ui_lang", "A"),
("ebd.has_english_ui_diff_tweet_lang", "A"),
("ebd.is_self_tweet", "A"),
("ebd.tweet_age_in_secs", "A"),
("encoded_tweet_features.favorite_count", "A"),
("encoded_tweet_features.from_verified_account_flag", "A"),
("encoded_tweet_features.has_card_flag", "A"),
# ("encoded_tweet_features.has_consumer_video_flag", "A"),
("encoded_tweet_features.has_image_url_flag", "A"),
("encoded_tweet_features.has_link_flag", "A"),
("encoded_tweet_features.has_multiple_hashtags_or_trends_flag", "A"),
# ("encoded_tweet_features.has_multiple_media_flag", "A"),
("encoded_tweet_features.has_native_image_flag", "A"),
("encoded_tweet_features.has_news_url_flag", "A"),
("encoded_tweet_features.has_periscope_flag", "A"),
("encoded_tweet_features.has_pro_video_flag", "A"),
("encoded_tweet_features.has_quote_flag", "A"),
("encoded_tweet_features.has_trend_flag", "A"),
("encoded_tweet_features.has_video_url_flag", "A"),
("encoded_tweet_features.has_vine_flag", "A"),
("encoded_tweet_features.has_visible_link_flag", "A"),
("encoded_tweet_features.is_offensive_flag", "A"),
("encoded_tweet_features.is_reply_flag", "A"),
("encoded_tweet_features.is_retweet_flag", "A"),
("encoded_tweet_features.is_sensitive_content", "A"),
# ("encoded_tweet_features.is_user_new_flag", "A"),
("encoded_tweet_features.language", "A"),
("encoded_tweet_features.link_language", "A"),
("encoded_tweet_features.num_hashtags", "A"),
("encoded_tweet_features.num_mentions", "A"),
# ("encoded_tweet_features.profile_is_egg_flag", "A"),
("encoded_tweet_features.reply_count", "A"),
("encoded_tweet_features.retweet_count", "A"),
("encoded_tweet_features.text_score", "A"),
("encoded_tweet_features.user_reputation", "A"),
("extended_encoded_tweet_features.embeds_impression_count", "A"),
("extended_encoded_tweet_features.embeds_impression_count_v2", "A"),
("extended_encoded_tweet_features.embeds_url_count", "A"),
("extended_encoded_tweet_features.embeds_url_count_v2", "A"),
("extended_encoded_tweet_features.favorite_count_v2", "A"),
("extended_encoded_tweet_features.label_abusive_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.label_dup_content_flag", "A"),
("extended_encoded_tweet_features.label_nsfw_hi_prc_flag", "A"),
("extended_encoded_tweet_features.label_nsfw_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.label_spam_flag", "A"),
("extended_encoded_tweet_features.label_spam_hi_rcl_flag", "A"),
("extended_encoded_tweet_features.quote_count", "A"),
("extended_encoded_tweet_features.reply_count_v2", "A"),
("extended_encoded_tweet_features.retweet_count_v2", "A"),
("extended_encoded_tweet_features.weighted_favorite_count", "A"),
("extended_encoded_tweet_features.weighted_quote_count", "A"),
("extended_encoded_tweet_features.weighted_reply_count", "A"),
("extended_encoded_tweet_features.weighted_retweet_count", "A"),
]
)
.add_labels(
[
label, # Tensor index: 0
"recap.engagement.is_clicked", # Tensor index: 1
"recap.engagement.is_favorited", # Tensor index: 2
"recap.engagement.is_open_linked", # Tensor index: 3
"recap.engagement.is_photo_expanded", # Tensor index: 4
"recap.engagement.is_profile_clicked", # Tensor index: 5
"recap.engagement.is_replied", # Tensor index: 6
"recap.engagement.is_retweeted", # Tensor index: 7
"recap.engagement.is_video_playback_50", # Tensor index: 8
"timelines.earlybird_score", # Tensor index: 9
]
)
.define_weight("meta.record_weight/type=earlybird")
.build()
)

View File

@ -1,3 +1,5 @@
Tweepcred
Tweepcred is a social network analysis tool that calculates the influence of Twitter users based on their interactions with other users. The tool uses the PageRank algorithm to rank users based on their influence.
PageRank Algorithm
@ -70,4 +72,4 @@ The algorithm tests for convergence by calculating the total difference between
This is a helper class called Reputation that contains methods for calculating a user's reputation score. The first method called scaledReputation takes a Double parameter raw which represents the user's page rank, and returns a Byte value that represents the user's reputation on a scale of 0 to 100. This method uses a formula that involves converting the logarithm of the page rank to a number between 0 and 100.
The second method called adjustReputationsPostCalculation takes three parameters: mass (a Double value representing the user's page rank), numFollowers (an Int value representing the number of followers a user has), and numFollowings (an Int value representing the number of users a user is following). This method reduces the page rank of users who have a low number of followers but a high number of followings. It calculates a division factor based on the ratio of followings to followers, and reduces the user's page rank by dividing it by this factor. The method returns the adjusted page rank.
The second method called adjustReputationsPostCalculation takes three parameters: mass (a Double value representing the user's page rank), numFollowers (an Int value representing the number of followers a user has), and numFollowings (an Int value representing the number of users a user is following). This method reduces the page rank of users who have a low number of followers but a high number of followings. It calculates a division factor based on the ratio of followings to followers, and reduces the user's page rank by dividing it by this factor. The method returns the adjusted page rank.

View File

@ -1,17 +1,17 @@
# UserTweetEntityGraph (UTEG)
## What is it
User Tweet Entity Graph (UTEG) is a Finalge thrift service built on the GraphJet framework. In maintains a graph of user-tweet relationships and serves user recommendations based on traversals in this graph.
User Tweet Entity Graph (UTEG) is a Finalge thrift service built on the GraphJet framework. It maintains a graph of user-tweet relationships and serves user recommendations based on traversals in this graph.
## How is it used on Twitter
UTEG generates the "XXX Liked" out-of-network tweets seen on Twitter's Home Timeline.
The core idea behind UTEG is collaborative filtering. UTEG takes a user's weighted follow graph (i.e a list of weighted userIds) as input,
performs efficient traversal & aggregation, and returns the top weighted tweets engaged basd on # of users that engaged the tweet, as well as
The core idea behind UTEG is collaborative filtering. UTEG takes a user's weighted follow graph (i.e a list of weighted userIds) as input,
performs efficient traversal & aggregation, and returns the top-weighted tweets engaged based on # of users that engaged the tweet, as well as
the engaged users' weights.
UTEG is a stateful service and relies on a Kafka stream to ingest & persist states. It maintains an in-memory user engagements over the past
24-48 hours. Older events are dropped and GC'ed.
UTEG is a stateful service and relies on a Kafka stream to ingest & persist states. It maintains in-memory user engagements over the past
24-48 hours. Older events are dropped and GC'ed.
For full details on storage & processing, please check out our open-sourced project GraphJet, a general-purpose high performance in-memory storage engine.
For full details on storage & processing, please check out our open-sourced project GraphJet, a general-purpose high-performance in-memory storage engine.
- https://github.com/twitter/GraphJet
- http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf

View File

@ -78,7 +78,7 @@ sealed trait SimClustersEmbedding extends Equals {
CosineSimilarityUtil.applyNormArray(sortedScores, expScaledNorm)
/**
* The Standard Deviation of a Embedding.
* The Standard Deviation of an Embedding.
*/
lazy val std: Double = {
if (scores.isEmpty) {

View File

@ -831,7 +831,7 @@ struct ThriftSearchResult {
12: optional list<hits.ThriftHits> cardTitleHitHighlights
13: optional list<hits.ThriftHits> cardDescriptionHitHighlights
// Expansion types, if expandResult == False, the expasions set should be ignored.
// Expansion types, if expandResult == False, the expansions set should be ignored.
8: optional bool expandResult = 0
9: optional set<expansions.ThriftTweetExpansionType> expansions
@ -971,7 +971,7 @@ struct ThriftTermStatisticsResults {
// The binIds will correspond to the times of the hits matching the driving search query for this
// term statistics request.
// If there were no hits matching the search query, numBins binIds will be returned, but the
// values of the binIds will not meaninfully correspond to anything related to the query, and
// values of the binIds will not meaningfully correspond to anything related to the query, and
// should not be used. Such cases can be identified by ThriftSearchResults.numHitsProcessed being
// set to 0 in the response, and the response not being early terminated.
3: optional list<i32> binIds
@ -1097,7 +1097,7 @@ struct ThriftSearchResults {
// Superroots' schema merge/choose logic when returning results to clients:
// . pick the schema based on the order of: realtime > protected > archive
// . because of the above ordering, it is possible that archive earlybird schema with a new flush
// verion (with new bit features) might be lost to older realtime earlybird schema; this is
// version (with new bit features) might be lost to older realtime earlybird schema; this is
// considered to to be rare and acceptable because one realtime earlybird deploy would fix it
21: optional features.ThriftSearchFeatureSchema featureSchema

View File

@ -70,7 +70,7 @@ struct TweetTopKTweetsWithScore {
/**
* The generic SimClustersEmbedding for online long-term storage and real-time calculation.
* Use SimClustersEmbeddingId as the only identifier.
* Warning: Doesn't include modelversion and embedding type in the value struct.
* Warning: Doesn't include model version and embedding type in the value struct.
**/
struct SimClustersEmbedding {
1: required list<SimClusterWithScore> embedding

View File

@ -50,7 +50,7 @@ struct CandidateTweets {
}(hasPersonalData = 'true')
/**
* An encapuslated collection of reference tweets
* An encapsulated collection of reference tweets
**/
struct ReferenceTweets {
1: required i64 targetUserId(personalDataType = 'UserId')

View File

@ -33,7 +33,7 @@ enum EmbeddingType {
Pop10000RankDecay11Tweet = 31,
OonPop1000RankDecayTweet = 32,
// [Experimental] Offline generated produciton-like LogFavScore-based Tweet Embedding
// [Experimental] Offline generated production-like LogFavScore-based Tweet Embedding
OfflineGeneratedLogFavBasedTweet = 40,
// Reserve 51-59 for Ads Embedding
@ -104,7 +104,7 @@ enum EmbeddingType {
//Reserved 401 - 500 for Space embedding
FavBasedApeSpace = 401 // DEPRECATED
LogFavBasedListenerSpace = 402 // DEPRECATED
LogFavBasedAPESpeakerSpace = 403 // DEPRCATED
LogFavBasedAPESpeakerSpace = 403 // DEPRECATED
LogFavBasedUserInterestedInListenerSpace = 404 // DEPRECATED
// Experimental, internal-only IDs

View File

@ -1,36 +1,13 @@
Overview
========
**TimelineRanker** (TLR) is a legacy service which provides relevance-scored tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service. Despite its name, it no longer does any kind of heavy ranking/model based ranking itself - just uses relevance scores from the Search Index for ranked tweet endpoints.
# TimelineRanker
**TimelineRanker** (TLR) is a legacy service that provides relevance-scored tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service. Despite its name, it no longer performs heavy ranking or model-based ranking itself; it only uses relevance scores from the Search Index for ranked tweet endpoints.
The following is a list of major services that Timeline Ranker interacts with:
**Earlybird-root-superroot (a.k.a Search)**
Timeline Ranker calls the Search Index's super root to fetch a list of Tweets.
**User Tweet Entity Graph (UTEG)**
Timeline Ranker calls UTEG to fetch a list of tweets liked by the users you follow.
**Socialgraph**
Timeline Ranker calls Social Graph Service to obtain follow graph and user states such as blocked, muted, retweets muted, etc.
**TweetyPie**
Timeline Ranker hydrates tweets by calling TweetyPie so that it can post-filter tweets based on certain hydrated fields.
**Manhattan**
Timeline Ranker hydrates some tweet features (eg, user languages) from Manhattan.
**Home Mixer**
Home Mixer calls Timeline Ranker to fetch tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service to power both the For You and Following Home Timelines.
Timeline Ranker does light ranking based on Earlybird tweet candidate scores and truncates to the number of candidates requested by Home Mixer based on these scores
- **Earlybird-root-superroot (a.k.a Search):** Timeline Ranker calls the Search Index's super root to fetch a list of Tweets.
- **User Tweet Entity Graph (UTEG):** Timeline Ranker calls UTEG to fetch a list of tweets liked by the users you follow.
- **Socialgraph:** Timeline Ranker calls Social Graph Service to obtain the follow graph and user states such as blocked, muted, retweets muted, etc.
- **TweetyPie:** Timeline Ranker hydrates tweets by calling TweetyPie to post-filter tweets based on certain hydrated fields.
- **Manhattan:** Timeline Ranker hydrates some tweet features (e.g., user languages) from Manhattan.
**Home Mixer** calls Timeline Ranker to fetch tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service to power both the For You and Following Home Timelines. Timeline Ranker performs light ranking based on Earlybird tweet candidate scores and truncates to the number of candidates requested by Home Mixer based on these scores.

View File

@ -0,0 +1,8 @@
# Topic Social Proof Service (TSPS)
=================
**Topic Social Proof Service** (TSPS) serves as a centralized source for verifying topics related to Timelines and Notifications. By analyzing user's topic preferences, such as following or unfollowing, and employing semantic annotations and tweet embeddings from SimClusters, or other machine learning models, TSPS delivers highly relevant topics tailored to each user's interests.
For instance, when a tweet discusses Stephen Curry, the service determines if the content falls under topics like "NBA" and/or "Golden State Warriors" while also providing relevance scores based on SimClusters Embedding. Additionally, TSPS evaluates user-specific topic preferences to offer a comprehensive list of available topics, only those the user is currently following, or new topics they have not followed but may find interesting if recommended on specific product surfaces.

View File

@ -0,0 +1,24 @@
jvm_binary(
name = "bin",
basename = "topic-social-proof",
main = "com.twitter.tsp.TopicSocialProofStratoFedServerMain",
runtime_platform = "java11",
tags = [
"bazel-compatible",
],
dependencies = [
"strato/src/main/scala/com/twitter/strato/logging/logback",
"topic-social-proof/server/src/main/resources",
"topic-social-proof/server/src/main/scala/com/twitter/tsp",
],
)
# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app
jvm_app(
name = "topic-social-proof-app",
archive = "zip",
binary = ":bin",
tags = [
"bazel-compatible",
],
)

View File

@ -0,0 +1,8 @@
resources(
sources = [
"*.xml",
"*.yml",
"config/*.yml",
],
tags = ["bazel-compatible"],
)

View File

@ -0,0 +1,61 @@
# Keys are sorted in an alphabetical order
enable_topic_social_proof_score:
comment : "Enable the calculation of <topic, tweet> cosine similarity score in TopicSocialProofStore. 0 means do not calculate the score and use a random rank to generate topic social proof"
default_availability: 0
enable_tweet_health_score:
comment: "Enable the calculation for health scores in tweetInfo. By enabling this decider, we will compute TweetHealthModelScore"
default_availability: 0
enable_user_agatha_score:
comment: "Enable the calculation for health scores in tweetInfo. By enabling this decider, we will compute UserHealthModelScore"
default_availability: 0
enable_loadshedding_HomeTimeline:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineRecommendTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_MagicRecsRecommendTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_TopicLandingPage:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineFeatures:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineTopicTweetsMetrics:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineUTEGTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_HomeTimelineSimClusters:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_ExploreTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_MagicRecsTopicTweets:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0
enable_loadshedding_Search:
comment: "Enable loadshedding (from 0% to 100%). Requests that have been shed will return an empty response"
default_availability: 0

View File

@ -0,0 +1,155 @@
<configuration>
<shutdownHook class="ch.qos.logback.core.hook.DelayingShutdownHook"/>
<property name="async_queue_size" value="${queue.size:-50000}"/>
<property name="async_max_flush_time" value="${max.flush.time:-0}"/>
<!-- ===================================================== -->
<!-- Structured Logging -->
<!-- ===================================================== -->
<!-- Only sample 0.1% of the requests -->
<property name="splunk_sampling_rate" value="${splunk_sampling_rate:-0.001}"/>
<include resource="structured-logger-logback.xml"/>
<!-- ===================================================== -->
<!-- Service Config -->
<!-- ===================================================== -->
<property name="DEFAULT_SERVICE_PATTERN"
value="%-16X{transactionId} %logger %msg"/>
<!-- ===================================================== -->
<!-- Common Config -->
<!-- ===================================================== -->
<!-- JUL/JDK14 to Logback bridge -->
<contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator">
<resetJUL>true</resetJUL>
</contextListener>
<!-- Service Log (Rollover every 50MB, max 11 logs) -->
<appender name="SERVICE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${log.service.output}</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${log.service.output}.%i</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>10</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>50MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
</encoder>
</appender>
<!-- Strato package only log (Rollover every 50MB, max 11 logs) -->
<appender name="STRATO-ONLY" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${log.strato_only.output}</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${log.strato_only.output}.%i</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>10</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>50MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
</encoder>
</appender>
<!-- LogLens -->
<appender name="LOGLENS" class="com.twitter.loglens.logback.LoglensAppender">
<mdcAdditionalContext>true</mdcAdditionalContext>
<category>loglens</category>
<index>${log.lens.index}</index>
<tag>${log.lens.tag}/service</tag>
<encoder>
<pattern>%msg%n</pattern>
</encoder>
<turboFilter class="ch.qos.logback.classic.turbo.DuplicateMessageFilter">
<cacheSize>500</cacheSize>
<allowedRepetitions>50</allowedRepetitions>
</turboFilter>
<filter class="com.twitter.strato.logging.logback.RegexFilter">
<forLogger>manhattan-client</forLogger>
<excludeRegex>.*InvalidRequest.*</excludeRegex>
</filter>
</appender>
<!-- ===================================================== -->
<!-- Primary Async Appenders -->
<!-- ===================================================== -->
<appender name="ASYNC-SERVICE" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>${async_queue_size}</queueSize>
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
<appender-ref ref="SERVICE"/>
</appender>
<appender name="ASYNC-STRATO-ONLY" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>${async_queue_size}</queueSize>
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
<appender-ref ref="STRATO-ONLY"/>
</appender>
<appender name="ASYNC-LOGLENS" class="ch.qos.logback.classic.AsyncAppender">
<queueSize>${async_queue_size}</queueSize>
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
<appender-ref ref="LOGLENS"/>
</appender>
<!-- ===================================================== -->
<!-- Package Config -->
<!-- ===================================================== -->
<!-- Per-Package Config (shared) -->
<logger name="com.twitter" level="info"/>
<!--
By default, we leave the strato package at INFO level.
However, this line allows us to set the entire strato package, or a subset of it, to
a specific level. For example, if you pass -Dstrato_log_package=streaming -Dstrato_log_level=DEBUG
only loggers under com.twitter.strato.streaming.* will be set to DEBUG level. Passing only
-Dstrato_log_level will set all of strato.* to the specified level.
-->
<logger name="com.twitter.strato${strato_log_package:-}" level="${strato_log_level:-INFO}"/>
<logger name="com.twitter.wilyns" level="warn"/>
<logger name="com.twitter.finagle.mux" level="warn"/>
<logger name="com.twitter.finagle.serverset2" level="warn"/>
<logger name="com.twitter.logging.ScribeHandler" level="warn"/>
<logger name="com.twitter.zookeeper.client.internal" level="warn"/>
<logger name="com.twitter.decider.StoreDecider" level="warn"/>
<!-- Per-Package Config (Strato) -->
<logger name="com.twitter.distributedlog.client" level="warn"/>
<logger name="com.twitter.finagle.mtls.authorization.config.AccessControlListConfiguration" level="warn"/>
<logger name="com.twitter.finatra.kafka.common.kerberoshelpers" level="warn"/>
<logger name="com.twitter.finatra.kafka.utils.BootstrapServerUtils" level="warn"/>
<logger name="com.twitter.server.coordinate" level="error"/>
<logger name="com.twitter.zookeeper.client" level="info"/>
<logger name="org.apache.zookeeper" level="error"/>
<logger name="org.apache.zookeeper.ClientCnxn" level="warn"/>
<logger name="ZkSession" level="info"/>
<logger name="OptimisticLockingCache" level="off"/>
<logger name="manhattan-client" level="warn"/>
<logger name="strato.op" level="warn"/>
<logger name="org.apache.kafka.clients.NetworkClient" level="error"/>
<logger name="org.apache.kafka.clients.consumer.internals" level="error"/>
<logger name="org.apache.kafka.clients.producer.internals" level="error"/>
<!-- produce a lot of messages like: Building client authenticator with server name kafka -->
<logger name="org.apache.kafka.common.network" level="warn"/>
<!-- Root Config -->
<root level="${log_level:-INFO}">
<appender-ref ref="ASYNC-SERVICE"/>
<appender-ref ref="ASYNC-LOGLENS"/>
</root>
<!-- Strato package only logging-->
<logger name="com.twitter.strato"
level="info"
additivity="true">
<appender-ref ref="ASYNC-STRATO-ONLY" />
</logger>
</configuration>

View File

@ -0,0 +1,12 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"finatra/inject/inject-thrift-client",
"strato/src/main/scala/com/twitter/strato/fed",
"strato/src/main/scala/com/twitter/strato/fed/server",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/columns",
],
)

View File

@ -0,0 +1,56 @@
package com.twitter.tsp
import com.google.inject.Module
import com.twitter.strato.fed._
import com.twitter.strato.fed.server._
import com.twitter.strato.warmup.Warmer
import com.twitter.tsp.columns.TopicSocialProofColumn
import com.twitter.tsp.columns.TopicSocialProofBatchColumn
import com.twitter.tsp.handlers.UttChildrenWarmupHandler
import com.twitter.tsp.modules.RepresentationScorerStoreModule
import com.twitter.tsp.modules.GizmoduckUserModule
import com.twitter.tsp.modules.TSPClientIdModule
import com.twitter.tsp.modules.TopicListingModule
import com.twitter.tsp.modules.TopicSocialProofStoreModule
import com.twitter.tsp.modules.TopicTweetCosineSimilarityAggregateStoreModule
import com.twitter.tsp.modules.TweetInfoStoreModule
import com.twitter.tsp.modules.TweetyPieClientModule
import com.twitter.tsp.modules.UttClientModule
import com.twitter.tsp.modules.UttLocalizationModule
import com.twitter.util.Future
object TopicSocialProofStratoFedServerMain extends TopicSocialProofStratoFedServer
trait TopicSocialProofStratoFedServer extends StratoFedServer {
override def dest: String = "/s/topic-social-proof/topic-social-proof"
override val modules: Seq[Module] =
Seq(
GizmoduckUserModule,
RepresentationScorerStoreModule,
TopicSocialProofStoreModule,
TopicListingModule,
TopicTweetCosineSimilarityAggregateStoreModule,
TSPClientIdModule,
TweetInfoStoreModule,
TweetyPieClientModule,
UttClientModule,
UttLocalizationModule
)
override def columns: Seq[Class[_ <: StratoFed.Column]] =
Seq(
classOf[TopicSocialProofColumn],
classOf[TopicSocialProofBatchColumn]
)
override def configureWarmer(warmer: Warmer): Unit = {
warmer.add(
"uttChildrenWarmupHandler",
() => {
handle[UttChildrenWarmupHandler]()
Future.Unit
}
)
}
}

View File

@ -0,0 +1,12 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"stitch/stitch-storehaus",
"strato/src/main/scala/com/twitter/strato/fed",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/service",
"topic-social-proof/server/src/main/thrift:thrift-scala",
],
)

View File

@ -0,0 +1,84 @@
package com.twitter.tsp.columns
import com.twitter.stitch.SeqGroup
import com.twitter.stitch.Stitch
import com.twitter.strato.catalog.Fetch
import com.twitter.strato.catalog.OpMetadata
import com.twitter.strato.config._
import com.twitter.strato.config.AllowAll
import com.twitter.strato.config.ContactInfo
import com.twitter.strato.config.Policy
import com.twitter.strato.data.Conv
import com.twitter.strato.data.Description.PlainText
import com.twitter.strato.data.Lifecycle.Production
import com.twitter.strato.fed.StratoFed
import com.twitter.strato.thrift.ScroogeConv
import com.twitter.tsp.thriftscala.TopicSocialProofRequest
import com.twitter.tsp.thriftscala.TopicSocialProofOptions
import com.twitter.tsp.service.TopicSocialProofService
import com.twitter.tsp.thriftscala.TopicWithScore
import com.twitter.util.Future
import com.twitter.util.Try
import javax.inject.Inject
class TopicSocialProofBatchColumn @Inject() (
topicSocialProofService: TopicSocialProofService)
extends StratoFed.Column(TopicSocialProofBatchColumn.Path)
with StratoFed.Fetch.Stitch {
override val policy: Policy =
ReadWritePolicy(
readPolicy = AllowAll,
writePolicy = AllowKeyAuthenticatedTwitterUserId
)
override type Key = Long
override type View = TopicSocialProofOptions
override type Value = Seq[TopicWithScore]
override val keyConv: Conv[Key] = Conv.ofType
override val viewConv: Conv[View] = ScroogeConv.fromStruct[TopicSocialProofOptions]
override val valueConv: Conv[Value] = Conv.seq(ScroogeConv.fromStruct[TopicWithScore])
override val metadata: OpMetadata =
OpMetadata(
lifecycle = Some(Production),
Some(PlainText("Topic Social Proof Batched Federated Column")))
case class TspsGroup(view: View) extends SeqGroup[Long, Fetch.Result[Value]] {
override protected def run(keys: Seq[Long]): Future[Seq[Try[Result[Seq[TopicWithScore]]]]] = {
val request = TopicSocialProofRequest(
userId = view.userId,
tweetIds = keys.toSet,
displayLocation = view.displayLocation,
topicListingSetting = view.topicListingSetting,
context = view.context,
bypassModes = view.bypassModes,
tags = view.tags
)
val response = topicSocialProofService
.topicSocialProofHandlerStoreStitch(request)
.map(_.socialProofs)
Stitch
.run(response).map(r =>
keys.map(key => {
Try {
val v = r.get(key)
if (v.nonEmpty && v.get.nonEmpty) {
found(v.get)
} else {
missing
}
}
}))
}
}
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
Stitch.call(key, TspsGroup(view))
}
}
object TopicSocialProofBatchColumn {
val Path = "topic-signals/tsp/topic-social-proof-batched"
}

View File

@ -0,0 +1,47 @@
package com.twitter.tsp.columns
import com.twitter.stitch
import com.twitter.stitch.Stitch
import com.twitter.strato.catalog.OpMetadata
import com.twitter.strato.config._
import com.twitter.strato.config.AllowAll
import com.twitter.strato.config.ContactInfo
import com.twitter.strato.config.Policy
import com.twitter.strato.data.Conv
import com.twitter.strato.data.Description.PlainText
import com.twitter.strato.data.Lifecycle.Production
import com.twitter.strato.fed.StratoFed
import com.twitter.strato.thrift.ScroogeConv
import com.twitter.tsp.thriftscala.TopicSocialProofRequest
import com.twitter.tsp.thriftscala.TopicSocialProofResponse
import com.twitter.tsp.service.TopicSocialProofService
import javax.inject.Inject
class TopicSocialProofColumn @Inject() (
topicSocialProofService: TopicSocialProofService)
extends StratoFed.Column(TopicSocialProofColumn.Path)
with StratoFed.Fetch.Stitch {
override type Key = TopicSocialProofRequest
override type View = Unit
override type Value = TopicSocialProofResponse
override val keyConv: Conv[Key] = ScroogeConv.fromStruct[TopicSocialProofRequest]
override val viewConv: Conv[View] = Conv.ofType
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[TopicSocialProofResponse]
override val metadata: OpMetadata =
OpMetadata(lifecycle = Some(Production), Some(PlainText("Topic Social Proof Federated Column")))
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
topicSocialProofService
.topicSocialProofHandlerStoreStitch(key)
.map { result => found(result) }
.handle {
case stitch.NotFound => missing
}
}
}
object TopicSocialProofColumn {
val Path = "topic-signals/tsp/topic-social-proof"
}

View File

@ -0,0 +1,23 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"configapi/configapi-abdecider",
"configapi/configapi-core",
"content-recommender/thrift/src/main/thrift:thrift-scala",
"decider/src/main/scala",
"discovery-common/src/main/scala/com/twitter/discovery/common/configapi",
"featureswitches/featureswitches-core",
"finatra/inject/inject-core/src/main/scala",
"frigate/frigate-common:base",
"frigate/frigate-common:util",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/candidate",
"interests-service/thrift/src/main/thrift:thrift-scala",
"src/scala/com/twitter/simclusters_v2/common",
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
"stitch/stitch-storehaus",
"topic-social-proof/server/src/main/thrift:thrift-scala",
],
)

View File

@ -0,0 +1,19 @@
package com.twitter.tsp.common
import com.twitter.servo.decider.DeciderKeyEnum
object DeciderConstants {
val enableTopicSocialProofScore = "enable_topic_social_proof_score"
val enableHealthSignalsScoreDeciderKey = "enable_tweet_health_score"
val enableUserAgathaScoreDeciderKey = "enable_user_agatha_score"
}
object DeciderKey extends DeciderKeyEnum {
val enableHealthSignalsScoreDeciderKey: Value = Value(
DeciderConstants.enableHealthSignalsScoreDeciderKey
)
val enableUserAgathaScoreDeciderKey: Value = Value(
DeciderConstants.enableUserAgathaScoreDeciderKey
)
}

View File

@ -0,0 +1,34 @@
package com.twitter.tsp.common
import com.twitter.abdecider.LoggingABDecider
import com.twitter.featureswitches.v2.FeatureSwitches
import com.twitter.featureswitches.v2.builder.{FeatureSwitchesBuilder => FsBuilder}
import com.twitter.featureswitches.v2.experimentation.NullBucketImpressor
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.util.Duration
case class FeatureSwitchesBuilder(
statsReceiver: StatsReceiver,
abDecider: LoggingABDecider,
featuresDirectory: String,
addServiceDetailsFromAurora: Boolean,
configRepoDirectory: String = "/usr/local/config",
fastRefresh: Boolean = false,
impressExperiments: Boolean = true) {
def build(): FeatureSwitches = {
val featureSwitches = FsBuilder()
.abDecider(abDecider)
.statsReceiver(statsReceiver)
.configRepoAbsPath(configRepoDirectory)
.featuresDirectory(featuresDirectory)
.limitToReferencedExperiments(shouldLimit = true)
.experimentImpressionStatsEnabled(true)
if (!impressExperiments) featureSwitches.experimentBucketImpressor(NullBucketImpressor)
if (addServiceDetailsFromAurora) featureSwitches.serviceDetailsFromAurora()
if (fastRefresh) featureSwitches.refreshPeriod(Duration.fromSeconds(10))
featureSwitches.build()
}
}

View File

@ -0,0 +1,44 @@
package com.twitter.tsp.common
import com.twitter.decider.Decider
import com.twitter.decider.RandomRecipient
import com.twitter.util.Future
import javax.inject.Inject
import scala.util.control.NoStackTrace
/*
Provides deciders-controlled load shedding for a given displayLocation
The format of the decider keys is:
enable_loadshedding_<display location>
E.g.:
enable_loadshedding_HomeTimeline
Deciders are fractional, so a value of 50.00 will drop 50% of responses. If a decider key is not
defined for a particular displayLocation, those requests will always be served.
We should therefore aim to define keys for the locations we care most about in decider.yml,
so that we can control them during incidents.
*/
class LoadShedder @Inject() (decider: Decider) {
import LoadShedder._
// Fall back to False for any undefined key
private val deciderWithFalseFallback: Decider = decider.orElse(Decider.False)
private val keyPrefix = "enable_loadshedding"
def apply[T](typeString: String)(serve: => Future[T]): Future[T] = {
/*
Per-typeString level load shedding: enable_loadshedding_HomeTimeline
Checks if per-typeString load shedding is enabled
*/
val keyTyped = s"${keyPrefix}_$typeString"
if (deciderWithFalseFallback.isAvailable(keyTyped, recipient = Some(RandomRecipient)))
Future.exception(LoadSheddingException)
else serve
}
}
object LoadShedder {
object LoadSheddingException extends Exception with NoStackTrace
}

View File

@ -0,0 +1,98 @@
package com.twitter.tsp.common
import com.twitter.abdecider.LoggingABDecider
import com.twitter.abdecider.UserRecipient
import com.twitter.contentrecommender.thriftscala.DisplayLocation
import com.twitter.discovery.common.configapi.FeatureContextBuilder
import com.twitter.featureswitches.FSRecipient
import com.twitter.featureswitches.Recipient
import com.twitter.featureswitches.UserAgent
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.interests.thriftscala.TopicListingViewerContext
import com.twitter.timelines.configapi
import com.twitter.timelines.configapi.Params
import com.twitter.timelines.configapi.RequestContext
import com.twitter.timelines.configapi.abdecider.LoggingABDeciderExperimentContext
case class ParamsBuilder(
featureContextBuilder: FeatureContextBuilder,
abDecider: LoggingABDecider,
overridesConfig: configapi.Config,
statsReceiver: StatsReceiver) {
def buildFromTopicListingViewerContext(
topicListingViewerContext: Option[TopicListingViewerContext],
displayLocation: DisplayLocation,
userRoleOverride: Option[Set[String]] = None
): Params = {
topicListingViewerContext.flatMap(_.userId) match {
case Some(userId) =>
val userRecipient = ParamsBuilder.toFeatureSwitchRecipientWithTopicContext(
userId,
userRoleOverride,
topicListingViewerContext,
Some(displayLocation)
)
overridesConfig(
requestContext = RequestContext(
userId = Some(userId),
experimentContext = LoggingABDeciderExperimentContext(
abDecider,
Some(UserRecipient(userId, Some(userId)))),
featureContext = featureContextBuilder(
Some(userId),
Some(userRecipient)
)
),
statsReceiver
)
case _ =>
throw new IllegalArgumentException(
s"${this.getClass.getSimpleName} tried to build Param for a request without a userId"
)
}
}
}
object ParamsBuilder {
def toFeatureSwitchRecipientWithTopicContext(
userId: Long,
userRolesOverride: Option[Set[String]],
context: Option[TopicListingViewerContext],
displayLocationOpt: Option[DisplayLocation]
): Recipient = {
val userRoles = userRolesOverride match {
case Some(overrides) => Some(overrides)
case _ => context.flatMap(_.userRoles.map(_.toSet))
}
val recipient = FSRecipient(
userId = Some(userId),
userRoles = userRoles,
deviceId = context.flatMap(_.deviceId),
guestId = context.flatMap(_.guestId),
languageCode = context.flatMap(_.languageCode),
countryCode = context.flatMap(_.countryCode),
userAgent = context.flatMap(_.userAgent).flatMap(UserAgent(_)),
isVerified = None,
isTwoffice = None,
tooClient = None,
highWaterMark = None
)
displayLocationOpt match {
case Some(displayLocation) =>
recipient.withCustomFields(displayLocationCustomFieldMap(displayLocation))
case None =>
recipient
}
}
private val DisplayLocationCustomField = "display_location"
def displayLocationCustomFieldMap(displayLocation: DisplayLocation): (String, String) =
DisplayLocationCustomField -> displayLocation.toString
}

View File

@ -0,0 +1,65 @@
package com.twitter.tsp.common
import com.twitter.abdecider.LoggingABDecider
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.base.TargetUser
import com.twitter.frigate.common.candidate.TargetABDecider
import com.twitter.frigate.common.util.ABDeciderWithOverride
import com.twitter.gizmoduck.thriftscala.User
import com.twitter.simclusters_v2.common.UserId
import com.twitter.storehaus.ReadableStore
import com.twitter.timelines.configapi.Params
import com.twitter.tsp.thriftscala.TopicSocialProofRequest
import com.twitter.util.Future
case class DefaultRecTopicSocialProofTarget(
topicSocialProofRequest: TopicSocialProofRequest,
targetId: UserId,
user: Option[User],
abDecider: ABDeciderWithOverride,
params: Params
)(
implicit statsReceiver: StatsReceiver)
extends TargetUser
with TopicSocialProofRecRequest
with TargetABDecider {
override def globalStats: StatsReceiver = statsReceiver
override val targetUser: Future[Option[User]] = Future.value(user)
}
trait TopicSocialProofRecRequest {
tuc: TargetUser =>
val topicSocialProofRequest: TopicSocialProofRequest
}
case class RecTargetFactory(
abDecider: LoggingABDecider,
userStore: ReadableStore[UserId, User],
paramBuilder: ParamsBuilder,
statsReceiver: StatsReceiver) {
type RecTopicSocialProofTarget = DefaultRecTopicSocialProofTarget
def buildRecTopicSocialProofTarget(
request: TopicSocialProofRequest
): Future[RecTopicSocialProofTarget] = {
val userId = request.userId
userStore.get(userId).map { userOpt =>
val userRoles = userOpt.flatMap(_.roles.map(_.roles.toSet))
val context = request.context.copy(userId = Some(request.userId)) // override to make sure
val params = paramBuilder
.buildFromTopicListingViewerContext(Some(context), request.displayLocation, userRoles)
DefaultRecTopicSocialProofTarget(
request,
userId,
userOpt,
ABDeciderWithOverride(abDecider, None)(statsReceiver),
params
)(statsReceiver)
}
}
}

View File

@ -0,0 +1,26 @@
package com.twitter.tsp
package common
import com.twitter.decider.Decider
import com.twitter.decider.RandomRecipient
import com.twitter.decider.Recipient
import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing
import javax.inject.Inject
case class TopicSocialProofDecider @Inject() (decider: Decider) {
def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = {
decider.isAvailable(feature, recipient)
}
lazy val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider)
/**
* When useRandomRecipient is set to false, the decider is either completely on or off.
* When useRandomRecipient is set to true, the decider is on for the specified % of traffic.
*/
def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = {
if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient))
else isAvailable(feature, None)
}
}

View File

@ -0,0 +1,104 @@
package com.twitter.tsp.common
import com.twitter.finagle.stats.NullStatsReceiver
import com.twitter.logging.Logger
import com.twitter.timelines.configapi.BaseConfig
import com.twitter.timelines.configapi.BaseConfigBuilder
import com.twitter.timelines.configapi.FSBoundedParam
import com.twitter.timelines.configapi.FSParam
import com.twitter.timelines.configapi.FeatureSwitchOverrideUtil
object TopicSocialProofParams {
object TopicTweetsSemanticCoreVersionId
extends FSBoundedParam[Long](
name = "topic_tweets_semantic_core_annotation_version_id",
default = 1433487161551032320L,
min = 0L,
max = Long.MaxValue
)
object TopicTweetsSemanticCoreVersionIdsSet
extends FSParam[Set[Long]](
name = "topic_tweets_semantic_core_annotation_version_id_allowed_set",
default = Set(TopicTweetsSemanticCoreVersionId.default))
/**
* Controls the Topic Social Proof cosine similarity threshold for the Topic Tweets.
*/
object TweetToTopicCosineSimilarityThreshold
extends FSBoundedParam[Double](
name = "topic_tweets_cosine_similarity_threshold_tsp",
default = 0.0,
min = 0.0,
max = 1.0
)
object EnablePersonalizedContextTopics // master feature switch to enable backfill
extends FSParam[Boolean](
name = "topic_tweets_personalized_contexts_enable_personalized_contexts",
default = false
)
object EnableYouMightLikeTopic
extends FSParam[Boolean](
name = "topic_tweets_personalized_contexts_enable_you_might_like",
default = false
)
object EnableRecentEngagementsTopic
extends FSParam[Boolean](
name = "topic_tweets_personalized_contexts_enable_recent_engagements",
default = false
)
object EnableTopicTweetHealthFilterPersonalizedContexts
extends FSParam[Boolean](
name = "topic_tweets_personalized_contexts_health_switch",
default = true
)
object EnableTweetToTopicScoreRanking
extends FSParam[Boolean](
name = "topic_tweets_enable_tweet_to_topic_score_ranking",
default = true
)
}
object FeatureSwitchConfig {
private val enumFeatureSwitchOverrides = FeatureSwitchOverrideUtil
.getEnumFSOverrides(
NullStatsReceiver,
Logger(getClass),
)
private val intFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedIntFSOverrides()
private val longFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedLongFSOverrides(
TopicSocialProofParams.TopicTweetsSemanticCoreVersionId
)
private val doubleFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBoundedDoubleFSOverrides(
TopicSocialProofParams.TweetToTopicCosineSimilarityThreshold,
)
private val longSetFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getLongSetFSOverrides(
TopicSocialProofParams.TopicTweetsSemanticCoreVersionIdsSet,
)
private val booleanFeatureSwitchOverrides = FeatureSwitchOverrideUtil.getBooleanFSOverrides(
TopicSocialProofParams.EnablePersonalizedContextTopics,
TopicSocialProofParams.EnableYouMightLikeTopic,
TopicSocialProofParams.EnableRecentEngagementsTopic,
TopicSocialProofParams.EnableTopicTweetHealthFilterPersonalizedContexts,
TopicSocialProofParams.EnableTweetToTopicScoreRanking,
)
val config: BaseConfig = BaseConfigBuilder()
.set(enumFeatureSwitchOverrides: _*)
.set(intFeatureSwitchOverrides: _*)
.set(longFeatureSwitchOverrides: _*)
.set(doubleFeatureSwitchOverrides: _*)
.set(longSetFeatureSwitchOverrides: _*)
.set(booleanFeatureSwitchOverrides: _*)
.build()
}

View File

@ -0,0 +1,14 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
"stitch/stitch-storehaus",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/common",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/stores",
"topic-social-proof/server/src/main/thrift:thrift-scala",
"topiclisting/topiclisting-core/src/main/scala/com/twitter/topiclisting",
],
)

View File

@ -0,0 +1,587 @@
package com.twitter.tsp.handlers
import com.twitter.conversions.DurationOps._
import com.twitter.finagle.mux.ClientDiscardedRequestException
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.util.StatsUtil
import com.twitter.simclusters_v2.common.SemanticCoreEntityId
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
import com.twitter.simclusters_v2.thriftscala.ModelVersion
import com.twitter.strato.response.Err
import com.twitter.storehaus.ReadableStore
import com.twitter.timelines.configapi.Params
import com.twitter.topic_recos.common.Configs.ConsumerTopicEmbeddingType
import com.twitter.topic_recos.common.Configs.DefaultModelVersion
import com.twitter.topic_recos.common.Configs.ProducerTopicEmbeddingType
import com.twitter.topic_recos.common.Configs.TweetEmbeddingType
import com.twitter.topiclisting.TopicListingViewerContext
import com.twitter.topic_recos.common.LocaleUtil
import com.twitter.topiclisting.AnnotationRuleProvider
import com.twitter.tsp.common.DeciderConstants
import com.twitter.tsp.common.LoadShedder
import com.twitter.tsp.common.RecTargetFactory
import com.twitter.tsp.common.TopicSocialProofDecider
import com.twitter.tsp.common.TopicSocialProofParams
import com.twitter.tsp.stores.TopicSocialProofStore
import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof
import com.twitter.tsp.stores.UttTopicFilterStore
import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey
import com.twitter.tsp.thriftscala.MetricTag
import com.twitter.tsp.thriftscala.TopicFollowType
import com.twitter.tsp.thriftscala.TopicListingSetting
import com.twitter.tsp.thriftscala.TopicSocialProofRequest
import com.twitter.tsp.thriftscala.TopicSocialProofResponse
import com.twitter.tsp.thriftscala.TopicWithScore
import com.twitter.tsp.thriftscala.TspTweetInfo
import com.twitter.tsp.utils.HealthSignalsUtils
import com.twitter.util.Future
import com.twitter.util.Timer
import com.twitter.util.Duration
import com.twitter.util.TimeoutException
import scala.util.Random
class TopicSocialProofHandler(
topicSocialProofStore: ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]],
tweetInfoStore: ReadableStore[TweetId, TspTweetInfo],
uttTopicFilterStore: UttTopicFilterStore,
recTargetFactory: RecTargetFactory,
decider: TopicSocialProofDecider,
statsReceiver: StatsReceiver,
loadShedder: LoadShedder,
timer: Timer) {
import TopicSocialProofHandler._
def getTopicSocialProofResponse(
request: TopicSocialProofRequest
): Future[TopicSocialProofResponse] = {
val scopedStats = statsReceiver.scope(request.displayLocation.toString)
scopedStats.counter("fanoutRequests").incr(request.tweetIds.size)
scopedStats.stat("numTweetsPerRequest").add(request.tweetIds.size)
StatsUtil.trackBlockStats(scopedStats) {
recTargetFactory
.buildRecTopicSocialProofTarget(request).flatMap { target =>
val enableCosineSimilarityScoreCalculation =
decider.isAvailable(DeciderConstants.enableTopicSocialProofScore)
val semanticCoreVersionId =
target.params(TopicSocialProofParams.TopicTweetsSemanticCoreVersionId)
val semanticCoreVersionIdsSet =
target.params(TopicSocialProofParams.TopicTweetsSemanticCoreVersionIdsSet)
val allowListWithTopicFollowTypeFut = uttTopicFilterStore
.getAllowListTopicsForUser(
request.userId,
request.topicListingSetting,
TopicListingViewerContext
.fromThrift(request.context).copy(languageCode =
LocaleUtil.getStandardLanguageCode(request.context.languageCode)),
request.bypassModes.map(_.toSet)
).rescue {
case _ =>
scopedStats.counter("uttTopicFilterStoreFailure").incr()
Future.value(Map.empty[SemanticCoreEntityId, Option[TopicFollowType]])
}
val tweetInfoMapFut: Future[Map[TweetId, Option[TspTweetInfo]]] = Future
.collect(
tweetInfoStore.multiGet(request.tweetIds.toSet)
).raiseWithin(TweetInfoStoreTimeout)(timer).rescue {
case _: TimeoutException =>
scopedStats.counter("tweetInfoStoreTimeout").incr()
Future.value(Map.empty[TweetId, Option[TspTweetInfo]])
case _ =>
scopedStats.counter("tweetInfoStoreFailure").incr()
Future.value(Map.empty[TweetId, Option[TspTweetInfo]])
}
val definedTweetInfoMapFut =
keepTweetsWithTweetInfoAndLanguage(tweetInfoMapFut, request.displayLocation.toString)
Future
.join(definedTweetInfoMapFut, allowListWithTopicFollowTypeFut).map {
case (tweetInfoMap, allowListWithTopicFollowType) =>
val tweetIdsToQuery = tweetInfoMap.keys.toSet
val topicProofQueries =
tweetIdsToQuery.map { tweetId =>
TopicSocialProofStore.Query(
TopicSocialProofStore.CacheableQuery(
tweetId = tweetId,
tweetLanguage = LocaleUtil.getSupportedStandardLanguageCodeWithDefault(
tweetInfoMap.getOrElse(tweetId, None).flatMap {
_.language
}),
enableCosineSimilarityScoreCalculation =
enableCosineSimilarityScoreCalculation
),
allowedSemanticCoreVersionIds = semanticCoreVersionIdsSet
)
}
val topicSocialProofsFut: Future[Map[TweetId, Seq[TopicSocialProof]]] = {
Future
.collect(topicSocialProofStore.multiGet(topicProofQueries)).map(_.map {
case (query, results) =>
query.cacheableQuery.tweetId -> results.toSeq.flatten.filter(
_.semanticCoreVersionId == semanticCoreVersionId)
})
}.raiseWithin(TopicSocialProofStoreTimeout)(timer).rescue {
case _: TimeoutException =>
scopedStats.counter("topicSocialProofStoreTimeout").incr()
Future(Map.empty[TweetId, Seq[TopicSocialProof]])
case _ =>
scopedStats.counter("topicSocialProofStoreFailure").incr()
Future(Map.empty[TweetId, Seq[TopicSocialProof]])
}
val random = new Random(seed = request.userId.toInt)
topicSocialProofsFut.map { topicSocialProofs =>
val filteredTopicSocialProofs = filterByAllowedList(
topicSocialProofs,
request.topicListingSetting,
allowListWithTopicFollowType.keySet
)
val filteredTopicSocialProofsEmptyCount: Int =
filteredTopicSocialProofs.count {
case (_, topicSocialProofs: Seq[TopicSocialProof]) =>
topicSocialProofs.isEmpty
}
scopedStats
.counter("filteredTopicSocialProofsCount").incr(filteredTopicSocialProofs.size)
scopedStats
.counter("filteredTopicSocialProofsEmptyCount").incr(
filteredTopicSocialProofsEmptyCount)
if (isCrTopicTweets(request)) {
val socialProofs = filteredTopicSocialProofs.mapValues(_.flatMap { topicProof =>
val topicWithScores = buildTopicWithRandomScore(
topicProof,
allowListWithTopicFollowType,
random
)
topicWithScores
})
TopicSocialProofResponse(socialProofs)
} else {
val socialProofs = filteredTopicSocialProofs.mapValues(_.flatMap { topicProof =>
getTopicProofScore(
topicProof = topicProof,
allowListWithTopicFollowType = allowListWithTopicFollowType,
params = target.params,
random = random,
statsReceiver = statsReceiver
)
}.sortBy(-_.score).take(MaxCandidates))
val personalizedContextSocialProofs =
if (target.params(TopicSocialProofParams.EnablePersonalizedContextTopics)) {
val personalizedContextEligibility =
checkPersonalizedContextsEligibility(
target.params,
allowListWithTopicFollowType)
val filteredTweets =
filterPersonalizedContexts(socialProofs, tweetInfoMap, target.params)
backfillPersonalizedContexts(
allowListWithTopicFollowType,
filteredTweets,
request.tags.getOrElse(Map.empty),
personalizedContextEligibility)
} else {
Map.empty[TweetId, Seq[TopicWithScore]]
}
val mergedSocialProofs = socialProofs.map {
case (tweetId, proofs) =>
(
tweetId,
proofs
++ personalizedContextSocialProofs.getOrElse(tweetId, Seq.empty))
}
// Note that we will NOT filter out tweets with no TSP in either case
TopicSocialProofResponse(mergedSocialProofs)
}
}
}
}.flatten.raiseWithin(Timeout)(timer).rescue {
case _: ClientDiscardedRequestException =>
scopedStats.counter("ClientDiscardedRequestException").incr()
Future.value(DefaultResponse)
case err: Err if err.code == Err.Cancelled =>
scopedStats.counter("CancelledErr").incr()
Future.value(DefaultResponse)
case _ =>
scopedStats.counter("FailedRequests").incr()
Future.value(DefaultResponse)
}
}
}
/**
* Fetch the Score for each Topic Social Proof
*/
private def getTopicProofScore(
topicProof: TopicSocialProof,
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]],
params: Params,
random: Random,
statsReceiver: StatsReceiver
): Option[TopicWithScore] = {
val scopedStats = statsReceiver.scope("getTopicProofScores")
val enableTweetToTopicScoreRanking =
params(TopicSocialProofParams.EnableTweetToTopicScoreRanking)
val minTweetToTopicCosineSimilarityThreshold =
params(TopicSocialProofParams.TweetToTopicCosineSimilarityThreshold)
val topicWithScore =
if (enableTweetToTopicScoreRanking) {
scopedStats.counter("enableTweetToTopicScoreRanking").incr()
buildTopicWithValidScore(
topicProof,
TweetEmbeddingType,
Some(ConsumerTopicEmbeddingType),
Some(ProducerTopicEmbeddingType),
allowListWithTopicFollowType,
DefaultModelVersion,
minTweetToTopicCosineSimilarityThreshold
)
} else {
scopedStats.counter("buildTopicWithRandomScore").incr()
buildTopicWithRandomScore(
topicProof,
allowListWithTopicFollowType,
random
)
}
topicWithScore
}
private[handlers] def isCrTopicTweets(
request: TopicSocialProofRequest
): Boolean = {
// CrTopic (across a variety of DisplayLocations) is the only use case with TopicListingSetting.All
request.topicListingSetting == TopicListingSetting.All
}
/**
* Consolidate logics relevant to whether only quality topics should be enabled for Implicit Follows
*/
/***
* Consolidate logics relevant to whether Personalized Contexts backfilling should be enabled
*/
private[handlers] def checkPersonalizedContextsEligibility(
params: Params,
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]]
): PersonalizedContextEligibility = {
val scopedStats = statsReceiver.scope("checkPersonalizedContextsEligibility")
val isRecentFavInAllowlist = allowListWithTopicFollowType
.contains(AnnotationRuleProvider.recentFavTopicId)
val isRecentFavEligible =
isRecentFavInAllowlist && params(TopicSocialProofParams.EnableRecentEngagementsTopic)
if (isRecentFavEligible)
scopedStats.counter("isRecentFavEligible").incr()
val isRecentRetweetInAllowlist = allowListWithTopicFollowType
.contains(AnnotationRuleProvider.recentRetweetTopicId)
val isRecentRetweetEligible =
isRecentRetweetInAllowlist && params(TopicSocialProofParams.EnableRecentEngagementsTopic)
if (isRecentRetweetEligible)
scopedStats.counter("isRecentRetweetEligible").incr()
val isYMLInAllowlist = allowListWithTopicFollowType
.contains(AnnotationRuleProvider.youMightLikeTopicId)
val isYMLEligible =
isYMLInAllowlist && params(TopicSocialProofParams.EnableYouMightLikeTopic)
if (isYMLEligible)
scopedStats.counter("isYMLEligible").incr()
PersonalizedContextEligibility(isRecentFavEligible, isRecentRetweetEligible, isYMLEligible)
}
private[handlers] def filterPersonalizedContexts(
socialProofs: Map[TweetId, Seq[TopicWithScore]],
tweetInfoMap: Map[TweetId, Option[TspTweetInfo]],
params: Params
): Map[TweetId, Seq[TopicWithScore]] = {
val filters: Seq[(Option[TspTweetInfo], Params) => Boolean] = Seq(
healthSignalsFilter,
tweetLanguageFilter
)
applyFilters(socialProofs, tweetInfoMap, params, filters)
}
/** *
* filter tweets with None tweetInfo and undefined language
*/
private def keepTweetsWithTweetInfoAndLanguage(
tweetInfoMapFut: Future[Map[TweetId, Option[TspTweetInfo]]],
displayLocation: String
): Future[Map[TweetId, Option[TspTweetInfo]]] = {
val scopedStats = statsReceiver.scope(displayLocation)
tweetInfoMapFut.map { tweetInfoMap =>
val filteredTweetInfoMap = tweetInfoMap.filter {
case (_, optTweetInfo: Option[TspTweetInfo]) =>
if (optTweetInfo.isEmpty) {
scopedStats.counter("undefinedTweetInfoCount").incr()
}
optTweetInfo.exists { tweetInfo: TspTweetInfo =>
{
if (tweetInfo.language.isEmpty) {
scopedStats.counter("undefinedLanguageCount").incr()
}
tweetInfo.language.isDefined
}
}
}
val undefinedTweetInfoOrLangCount = tweetInfoMap.size - filteredTweetInfoMap.size
scopedStats.counter("undefinedTweetInfoOrLangCount").incr(undefinedTweetInfoOrLangCount)
scopedStats.counter("TweetInfoCount").incr(tweetInfoMap.size)
filteredTweetInfoMap
}
}
/***
* filter tweets with NO evergreen topic social proofs by their health signal scores & tweet languages
* i.e., tweets that are possible to be converted into Personalized Context topic tweets
* TBD: whether we are going to apply filters to all topic tweet candidates
*/
private def applyFilters(
socialProofs: Map[TweetId, Seq[TopicWithScore]],
tweetInfoMap: Map[TweetId, Option[TspTweetInfo]],
params: Params,
filters: Seq[(Option[TspTweetInfo], Params) => Boolean]
): Map[TweetId, Seq[TopicWithScore]] = {
socialProofs.collect {
case (tweetId, socialProofs) if socialProofs.nonEmpty || filters.forall { filter =>
filter(tweetInfoMap.getOrElse(tweetId, None), params)
} =>
tweetId -> socialProofs
}
}
private def healthSignalsFilter(
tweetInfoOpt: Option[TspTweetInfo],
params: Params
): Boolean = {
!params(
TopicSocialProofParams.EnableTopicTweetHealthFilterPersonalizedContexts) || HealthSignalsUtils
.isHealthyTweet(tweetInfoOpt)
}
private def tweetLanguageFilter(
tweetInfoOpt: Option[TspTweetInfo],
params: Params
): Boolean = {
PersonalizedContextTopicsAllowedLanguageSet
.contains(tweetInfoOpt.flatMap(_.language).getOrElse(LocaleUtil.DefaultLanguage))
}
private[handlers] def backfillPersonalizedContexts(
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]],
socialProofs: Map[TweetId, Seq[TopicWithScore]],
metricTagsMap: scala.collection.Map[TweetId, scala.collection.Set[MetricTag]],
personalizedContextEligibility: PersonalizedContextEligibility
): Map[TweetId, Seq[TopicWithScore]] = {
val scopedStats = statsReceiver.scope("backfillPersonalizedContexts")
socialProofs.map {
case (tweetId, topicWithScores) =>
if (topicWithScores.nonEmpty) {
tweetId -> Seq.empty
} else {
val metricTagContainsTweetFav = metricTagsMap
.getOrElse(tweetId, Set.empty[MetricTag]).contains(MetricTag.TweetFavorite)
val backfillRecentFav =
personalizedContextEligibility.isRecentFavEligible && metricTagContainsTweetFav
if (metricTagContainsTweetFav)
scopedStats.counter("MetricTag.TweetFavorite").incr()
if (backfillRecentFav)
scopedStats.counter("backfillRecentFav").incr()
val metricTagContainsRetweet = metricTagsMap
.getOrElse(tweetId, Set.empty[MetricTag]).contains(MetricTag.Retweet)
val backfillRecentRetweet =
personalizedContextEligibility.isRecentRetweetEligible && metricTagContainsRetweet
if (metricTagContainsRetweet)
scopedStats.counter("MetricTag.Retweet").incr()
if (backfillRecentRetweet)
scopedStats.counter("backfillRecentRetweet").incr()
val metricTagContainsRecentSearches = metricTagsMap
.getOrElse(tweetId, Set.empty[MetricTag]).contains(
MetricTag.InterestsRankerRecentSearches)
val backfillYML = personalizedContextEligibility.isYMLEligible
if (backfillYML)
scopedStats.counter("backfillYML").incr()
tweetId -> buildBackfillTopics(
allowListWithTopicFollowType,
backfillRecentFav,
backfillRecentRetweet,
backfillYML)
}
}
}
private def buildBackfillTopics(
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]],
backfillRecentFav: Boolean,
backfillRecentRetweet: Boolean,
backfillYML: Boolean
): Seq[TopicWithScore] = {
Seq(
if (backfillRecentFav) {
Some(
TopicWithScore(
topicId = AnnotationRuleProvider.recentFavTopicId,
score = 1.0,
topicFollowType = allowListWithTopicFollowType
.getOrElse(AnnotationRuleProvider.recentFavTopicId, None)
))
} else { None },
if (backfillRecentRetweet) {
Some(
TopicWithScore(
topicId = AnnotationRuleProvider.recentRetweetTopicId,
score = 1.0,
topicFollowType = allowListWithTopicFollowType
.getOrElse(AnnotationRuleProvider.recentRetweetTopicId, None)
))
} else { None },
if (backfillYML) {
Some(
TopicWithScore(
topicId = AnnotationRuleProvider.youMightLikeTopicId,
score = 1.0,
topicFollowType = allowListWithTopicFollowType
.getOrElse(AnnotationRuleProvider.youMightLikeTopicId, None)
))
} else { None }
).flatten
}
def toReadableStore: ReadableStore[TopicSocialProofRequest, TopicSocialProofResponse] = {
new ReadableStore[TopicSocialProofRequest, TopicSocialProofResponse] {
override def get(k: TopicSocialProofRequest): Future[Option[TopicSocialProofResponse]] = {
val displayLocation = k.displayLocation.toString
loadShedder(displayLocation) {
getTopicSocialProofResponse(k).map(Some(_))
}.rescue {
case LoadShedder.LoadSheddingException =>
statsReceiver.scope(displayLocation).counter("LoadSheddingException").incr()
Future.None
case _ =>
statsReceiver.scope(displayLocation).counter("Exception").incr()
Future.None
}
}
}
}
}
object TopicSocialProofHandler {
private val MaxCandidates = 10
// Currently we do hardcode for the language check of PersonalizedContexts Topics
private val PersonalizedContextTopicsAllowedLanguageSet: Set[String] =
Set("pt", "ko", "es", "ja", "tr", "id", "en", "hi", "ar", "fr", "ru")
private val Timeout: Duration = 200.milliseconds
private val TopicSocialProofStoreTimeout: Duration = 40.milliseconds
private val TweetInfoStoreTimeout: Duration = 60.milliseconds
private val DefaultResponse: TopicSocialProofResponse = TopicSocialProofResponse(Map.empty)
case class PersonalizedContextEligibility(
isRecentFavEligible: Boolean,
isRecentRetweetEligible: Boolean,
isYMLEligible: Boolean)
/**
* Calculate the Topic Scores for each (tweet, topic), filter out topic proofs whose scores do not
* pass the minimum threshold
*/
private[handlers] def buildTopicWithValidScore(
topicProof: TopicSocialProof,
tweetEmbeddingType: EmbeddingType,
maybeConsumerEmbeddingType: Option[EmbeddingType],
maybeProducerEmbeddingType: Option[EmbeddingType],
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]],
simClustersModelVersion: ModelVersion,
minTweetToTopicCosineSimilarityThreshold: Double
): Option[TopicWithScore] = {
val consumerScore = maybeConsumerEmbeddingType
.flatMap { consumerEmbeddingType =>
topicProof.scores.get(
ScoreKey(consumerEmbeddingType, tweetEmbeddingType, simClustersModelVersion))
}.getOrElse(0.0)
val producerScore = maybeProducerEmbeddingType
.flatMap { producerEmbeddingType =>
topicProof.scores.get(
ScoreKey(producerEmbeddingType, tweetEmbeddingType, simClustersModelVersion))
}.getOrElse(0.0)
val combinedScore = consumerScore + producerScore
if (combinedScore > minTweetToTopicCosineSimilarityThreshold || topicProof.ignoreSimClusterFiltering) {
Some(
TopicWithScore(
topicId = topicProof.topicId.entityId,
score = combinedScore,
topicFollowType =
allowListWithTopicFollowType.getOrElse(topicProof.topicId.entityId, None)))
} else {
None
}
}
private[handlers] def buildTopicWithRandomScore(
topicSocialProof: TopicSocialProof,
allowListWithTopicFollowType: Map[SemanticCoreEntityId, Option[TopicFollowType]],
random: Random
): Option[TopicWithScore] = {
Some(
TopicWithScore(
topicId = topicSocialProof.topicId.entityId,
score = random.nextDouble(),
topicFollowType =
allowListWithTopicFollowType.getOrElse(topicSocialProof.topicId.entityId, None)
))
}
/**
* Filter all the non-qualified Topic Social Proof
*/
private[handlers] def filterByAllowedList(
topicProofs: Map[TweetId, Seq[TopicSocialProof]],
setting: TopicListingSetting,
allowList: Set[SemanticCoreEntityId]
): Map[TweetId, Seq[TopicSocialProof]] = {
setting match {
case TopicListingSetting.All =>
// Return all the topics
topicProofs
case _ =>
topicProofs.mapValues(
_.filter(topicProof => allowList.contains(topicProof.topicId.entityId)))
}
}
}

View File

@ -0,0 +1,40 @@
package com.twitter.tsp.handlers
import com.twitter.inject.utils.Handler
import com.twitter.topiclisting.FollowableTopicProductId
import com.twitter.topiclisting.ProductId
import com.twitter.topiclisting.TopicListingViewerContext
import com.twitter.topiclisting.utt.UttLocalization
import com.twitter.util.logging.Logging
import javax.inject.Inject
import javax.inject.Singleton
/** *
* We configure Warmer to help warm up the cache hit rate under `CachedUttClient/get_utt_taxonomy/cache_hit_rate`
* In uttLocalization.getRecommendableTopics, we fetch all topics exist in UTT, and yet the process
* is in fact fetching the complete UTT tree struct (by calling getUttChildren recursively), which could take 1 sec
* Once we have the topics, we stored them in in-memory cache, and the cache hit rate is > 99%
*
*/
@Singleton
class UttChildrenWarmupHandler @Inject() (uttLocalization: UttLocalization)
extends Handler
with Logging {
/** Executes the function of this handler. * */
override def handle(): Unit = {
uttLocalization
.getRecommendableTopics(
productId = ProductId.Followable,
viewerContext = TopicListingViewerContext(languageCode = Some("en")),
enableInternationalTopics = true,
followableTopicProductId = FollowableTopicProductId.AllFollowable
)
.onSuccess { result =>
logger.info(s"successfully warmed up UttChildren. TopicId length = ${result.size}")
}
.onFailure { throwable =>
logger.info(s"failed to warm up UttChildren. Throwable = ${throwable}")
}
}
}

View File

@ -0,0 +1,30 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"3rdparty/jvm/com/twitter/bijection:scrooge",
"3rdparty/jvm/com/twitter/storehaus:memcache",
"escherbird/src/scala/com/twitter/escherbird/util/uttclient",
"escherbird/src/thrift/com/twitter/escherbird/utt:strato-columns-scala",
"finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication",
"finatra-internal/mtls-thriftmux/src/main/scala",
"finatra/inject/inject-core/src/main/scala",
"finatra/inject/inject-thrift-client",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato",
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
"src/scala/com/twitter/storehaus_internal/memcache",
"src/scala/com/twitter/storehaus_internal/util",
"src/thrift/com/twitter/gizmoduck:thrift-scala",
"src/thrift/com/twitter/gizmoduck:user-thrift-scala",
"stitch/stitch-storehaus",
"stitch/stitch-tweetypie/src/main/scala",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/common",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/stores",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/utils",
"topic-social-proof/server/src/main/thrift:thrift-scala",
"topiclisting/common/src/main/scala/com/twitter/topiclisting/clients",
"topiclisting/topiclisting-utt/src/main/scala/com/twitter/topiclisting/utt",
],
)

View File

@ -0,0 +1,35 @@
package com.twitter.tsp.modules
import com.google.inject.Module
import com.twitter.finagle.ThriftMux
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
import com.twitter.finagle.mtls.client.MtlsStackClient._
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.finagle.thrift.ClientId
import com.twitter.finatra.mtls.thriftmux.modules.MtlsClient
import com.twitter.gizmoduck.thriftscala.UserService
import com.twitter.inject.Injector
import com.twitter.inject.thrift.modules.ThriftMethodBuilderClientModule
object GizmoduckUserModule
extends ThriftMethodBuilderClientModule[
UserService.ServicePerEndpoint,
UserService.MethodPerEndpoint
]
with MtlsClient {
override val label: String = "gizmoduck"
override val dest: String = "/s/gizmoduck/gizmoduck"
override val modules: Seq[Module] = Seq(TSPClientIdModule)
override def configureThriftMuxClient(
injector: Injector,
client: ThriftMux.Client
): ThriftMux.Client = {
super
.configureThriftMuxClient(injector, client)
.withMutualTls(injector.instance[ServiceIdentifier])
.withClientId(injector.instance[ClientId])
.withStatsReceiver(injector.instance[StatsReceiver].scope("giz"))
}
}

View File

@ -0,0 +1,47 @@
package com.twitter.tsp.modules
import com.google.inject.Module
import com.google.inject.Provides
import com.google.inject.Singleton
import com.twitter.app.Flag
import com.twitter.bijection.scrooge.BinaryScalaCodec
import com.twitter.conversions.DurationOps._
import com.twitter.finagle.memcached.{Client => MemClient}
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
import com.twitter.inject.TwitterModule
import com.twitter.simclusters_v2.thriftscala.Score
import com.twitter.simclusters_v2.thriftscala.ScoreId
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.{Client => StratoClient}
import com.twitter.tsp.stores.RepresentationScorerStore
object RepresentationScorerStoreModule extends TwitterModule {
override def modules: Seq[Module] = Seq(UnifiedCacheClient)
private val tspRepresentationScoringColumnPath: Flag[String] = flag[String](
name = "tsp.representationScoringColumnPath",
default = "recommendations/representation_scorer/score",
help = "Strato column path for Representation Scorer Store"
)
@Provides
@Singleton
def providesRepresentationScorerStore(
statsReceiver: StatsReceiver,
stratoClient: StratoClient,
tspUnifiedCacheClient: MemClient
): ReadableStore[ScoreId, Score] = {
val underlyingStore =
RepresentationScorerStore(stratoClient, tspRepresentationScoringColumnPath(), statsReceiver)
ObservedMemcachedReadableStore.fromCacheClient(
backingStore = underlyingStore,
cacheClient = tspUnifiedCacheClient,
ttl = 2.hours
)(
valueInjection = BinaryScalaCodec(Score),
statsReceiver = statsReceiver.scope("RepresentationScorerStore"),
keyToString = { k: ScoreId => s"rsx/$k" }
)
}
}

View File

@ -0,0 +1,14 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.twitter.finagle.thrift.ClientId
import com.twitter.inject.TwitterModule
import javax.inject.Singleton
object TSPClientIdModule extends TwitterModule {
private val clientIdFlag = flag("thrift.clientId", "topic-social-proof.prod", "Thrift client id")
@Provides
@Singleton
def providesClientId: ClientId = ClientId(clientIdFlag())
}

View File

@ -0,0 +1,17 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.inject.TwitterModule
import com.twitter.topiclisting.TopicListing
import com.twitter.topiclisting.TopicListingBuilder
import javax.inject.Singleton
object TopicListingModule extends TwitterModule {
@Provides
@Singleton
def providesTopicListing(statsReceiver: StatsReceiver): TopicListing = {
new TopicListingBuilder(statsReceiver.scope(namespace = "TopicListingBuilder")).build
}
}

View File

@ -0,0 +1,68 @@
package com.twitter.tsp.modules
import com.google.inject.Module
import com.google.inject.Provides
import com.google.inject.Singleton
import com.twitter.conversions.DurationOps._
import com.twitter.finagle.memcached.{Client => MemClient}
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.hermit.store.common.ObservedCachedReadableStore
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
import com.twitter.hermit.store.common.ObservedReadableStore
import com.twitter.inject.TwitterModule
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.thriftscala.Score
import com.twitter.simclusters_v2.thriftscala.ScoreId
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.{Client => StratoClient}
import com.twitter.tsp.stores.SemanticCoreAnnotationStore
import com.twitter.tsp.stores.TopicSocialProofStore
import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof
import com.twitter.tsp.utils.LZ4Injection
import com.twitter.tsp.utils.SeqObjectInjection
object TopicSocialProofStoreModule extends TwitterModule {
override def modules: Seq[Module] = Seq(UnifiedCacheClient)
@Provides
@Singleton
def providesTopicSocialProofStore(
representationScorerStore: ReadableStore[ScoreId, Score],
statsReceiver: StatsReceiver,
stratoClient: StratoClient,
tspUnifiedCacheClient: MemClient,
): ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]] = {
val semanticCoreAnnotationStore: ReadableStore[TweetId, Seq[
SemanticCoreAnnotationStore.TopicAnnotation
]] = ObservedReadableStore(
SemanticCoreAnnotationStore(SemanticCoreAnnotationStore.getStratoStore(stratoClient))
)(statsReceiver.scope("SemanticCoreAnnotationStore"))
val underlyingStore = TopicSocialProofStore(
representationScorerStore,
semanticCoreAnnotationStore
)(statsReceiver.scope("TopicSocialProofStore"))
val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient(
backingStore = underlyingStore,
cacheClient = tspUnifiedCacheClient,
ttl = 15.minutes,
asyncUpdate = true
)(
valueInjection = LZ4Injection.compose(SeqObjectInjection[TopicSocialProof]()),
statsReceiver = statsReceiver.scope("memCachedTopicSocialProofStore"),
keyToString = { k: TopicSocialProofStore.Query => s"tsps/${k.cacheableQuery}" }
)
val inMemoryCachedStore =
ObservedCachedReadableStore.from[TopicSocialProofStore.Query, Seq[TopicSocialProof]](
memcachedStore,
ttl = 10.minutes,
maxKeys = 16777215, // ~ avg 160B, < 3000MB
cacheName = "topic_social_proof_cache",
windowSize = 10000L
)(statsReceiver.scope("InMemoryCachedTopicSocialProofStore"))
inMemoryCachedStore
}
}

View File

@ -0,0 +1,26 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.google.inject.Singleton
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.inject.TwitterModule
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.thriftscala.Score
import com.twitter.simclusters_v2.thriftscala.ScoreId
import com.twitter.simclusters_v2.thriftscala.TopicId
import com.twitter.storehaus.ReadableStore
import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore
import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey
object TopicTweetCosineSimilarityAggregateStoreModule extends TwitterModule {
@Provides
@Singleton
def providesTopicTweetCosineSimilarityAggregateStore(
representationScorerStore: ReadableStore[ScoreId, Score],
statsReceiver: StatsReceiver,
): ReadableStore[(TopicId, TweetId, Seq[ScoreKey]), Map[ScoreKey, Double]] = {
TopicTweetsCosineSimilarityAggregateStore(representationScorerStore)(
statsReceiver.scope("topicTweetsCosineSimilarityAggregateStore"))
}
}

View File

@ -0,0 +1,130 @@
package com.twitter.tsp.modules
import com.google.inject.Module
import com.google.inject.Provides
import com.google.inject.Singleton
import com.twitter.bijection.scrooge.BinaryScalaCodec
import com.twitter.conversions.DurationOps._
import com.twitter.finagle.memcached.{Client => MemClient}
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.store.health.TweetHealthModelStore
import com.twitter.frigate.common.store.health.TweetHealthModelStore.TweetHealthModelStoreConfig
import com.twitter.frigate.common.store.health.UserHealthModelStore
import com.twitter.frigate.common.store.interests.UserId
import com.twitter.frigate.thriftscala.TweetHealthScores
import com.twitter.frigate.thriftscala.UserAgathaScores
import com.twitter.hermit.store.common.DeciderableReadableStore
import com.twitter.hermit.store.common.ObservedCachedReadableStore
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
import com.twitter.inject.TwitterModule
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.stitch.tweetypie.TweetyPie
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.{Client => StratoClient}
import com.twitter.tsp.common.DeciderKey
import com.twitter.tsp.common.TopicSocialProofDecider
import com.twitter.tsp.stores.TweetInfoStore
import com.twitter.tsp.stores.TweetyPieFieldsStore
import com.twitter.tweetypie.thriftscala.TweetService
import com.twitter.tsp.thriftscala.TspTweetInfo
import com.twitter.util.JavaTimer
import com.twitter.util.Timer
object TweetInfoStoreModule extends TwitterModule {
override def modules: Seq[Module] = Seq(UnifiedCacheClient)
implicit val timer: Timer = new JavaTimer(true)
@Provides
@Singleton
def providesTweetInfoStore(
decider: TopicSocialProofDecider,
serviceIdentifier: ServiceIdentifier,
statsReceiver: StatsReceiver,
stratoClient: StratoClient,
tspUnifiedCacheClient: MemClient,
tweetyPieService: TweetService.MethodPerEndpoint
): ReadableStore[TweetId, TspTweetInfo] = {
val tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores] = {
val underlyingStore = TweetHealthModelStore.buildReadableStore(
stratoClient,
Some(
TweetHealthModelStoreConfig(
enablePBlock = true,
enableToxicity = true,
enablePSpammy = true,
enablePReported = true,
enableSpammyTweetContent = true,
enablePNegMultimodal = false))
)(statsReceiver.scope("UnderlyingTweetHealthModelStore"))
DeciderableReadableStore(
ObservedMemcachedReadableStore.fromCacheClient(
backingStore = underlyingStore,
cacheClient = tspUnifiedCacheClient,
ttl = 2.hours
)(
valueInjection = BinaryScalaCodec(TweetHealthScores),
statsReceiver = statsReceiver.scope("TweetHealthModelStore"),
keyToString = { k: TweetId => s"tHMS/$k" }
),
decider.deciderGateBuilder.idGate(DeciderKey.enableHealthSignalsScoreDeciderKey),
statsReceiver.scope("TweetHealthModelStore")
)
}
val userHealthModelStore: ReadableStore[UserId, UserAgathaScores] = {
val underlyingStore =
UserHealthModelStore.buildReadableStore(stratoClient)(
statsReceiver.scope("UnderlyingUserHealthModelStore"))
DeciderableReadableStore(
ObservedMemcachedReadableStore.fromCacheClient(
backingStore = underlyingStore,
cacheClient = tspUnifiedCacheClient,
ttl = 18.hours
)(
valueInjection = BinaryScalaCodec(UserAgathaScores),
statsReceiver = statsReceiver.scope("UserHealthModelStore"),
keyToString = { k: UserId => s"uHMS/$k" }
),
decider.deciderGateBuilder.idGate(DeciderKey.enableUserAgathaScoreDeciderKey),
statsReceiver.scope("UserHealthModelStore")
)
}
val tweetInfoStore: ReadableStore[TweetId, TspTweetInfo] = {
val underlyingStore = TweetInfoStore(
TweetyPieFieldsStore.getStoreFromTweetyPie(TweetyPie(tweetyPieService, statsReceiver)),
tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores],
userHealthModelStore: ReadableStore[UserId, UserAgathaScores],
timer: Timer
)(statsReceiver.scope("tweetInfoStore"))
val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient(
backingStore = underlyingStore,
cacheClient = tspUnifiedCacheClient,
ttl = 15.minutes,
// Hydrating tweetInfo is now a required step for all candidates,
// hence we needed to tune these thresholds.
asyncUpdate = serviceIdentifier.environment == "prod"
)(
valueInjection = BinaryScalaCodec(TspTweetInfo),
statsReceiver = statsReceiver.scope("memCachedTweetInfoStore"),
keyToString = { k: TweetId => s"tIS/$k" }
)
val inMemoryStore = ObservedCachedReadableStore.from(
memcachedStore,
ttl = 15.minutes,
maxKeys = 8388607, // Check TweetInfo definition. size~92b. Around 736 MB
windowSize = 10000L,
cacheName = "tweet_info_cache",
maxMultiGetSize = 20
)(statsReceiver.scope("inMemoryCachedTweetInfoStore"))
inMemoryStore
}
tweetInfoStore
}
}

View File

@ -0,0 +1,63 @@
package com.twitter.tsp
package modules
import com.google.inject.Module
import com.google.inject.Provides
import com.twitter.conversions.DurationOps.richDurationFromInt
import com.twitter.finagle.ThriftMux
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax
import com.twitter.finagle.mux.ClientDiscardedRequestException
import com.twitter.finagle.service.ReqRep
import com.twitter.finagle.service.ResponseClass
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.finagle.thrift.ClientId
import com.twitter.inject.Injector
import com.twitter.inject.thrift.modules.ThriftMethodBuilderClientModule
import com.twitter.tweetypie.thriftscala.TweetService
import com.twitter.util.Duration
import com.twitter.util.Throw
import com.twitter.stitch.tweetypie.{TweetyPie => STweetyPie}
import com.twitter.finatra.mtls.thriftmux.modules.MtlsClient
import javax.inject.Singleton
object TweetyPieClientModule
extends ThriftMethodBuilderClientModule[
TweetService.ServicePerEndpoint,
TweetService.MethodPerEndpoint
]
with MtlsClient {
override val label = "tweetypie"
override val dest = "/s/tweetypie/tweetypie"
override val requestTimeout: Duration = 450.milliseconds
override val modules: Seq[Module] = Seq(TSPClientIdModule)
// We bump the success rate from the default of 0.8 to 0.9 since we're dropping the
// consecutive failures part of the default policy.
override def configureThriftMuxClient(
injector: Injector,
client: ThriftMux.Client
): ThriftMux.Client =
super
.configureThriftMuxClient(injector, client)
.withMutualTls(injector.instance[ServiceIdentifier])
.withStatsReceiver(injector.instance[StatsReceiver].scope("clnt"))
.withClientId(injector.instance[ClientId])
.withResponseClassifier {
case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable
}
.withSessionQualifier
.successRateFailureAccrual(successRate = 0.9, window = 30.seconds)
.withResponseClassifier {
case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable
}
@Provides
@Singleton
def providesTweetyPie(
tweetyPieService: TweetService.MethodPerEndpoint
): STweetyPie = {
STweetyPie(tweetyPieService)
}
}

View File

@ -0,0 +1,33 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.google.inject.Singleton
import com.twitter.app.Flag
import com.twitter.finagle.memcached.Client
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.inject.TwitterModule
import com.twitter.storehaus_internal.memcache.MemcacheStore
import com.twitter.storehaus_internal.util.ClientName
import com.twitter.storehaus_internal.util.ZkEndPoint
object UnifiedCacheClient extends TwitterModule {
val tspUnifiedCacheDest: Flag[String] = flag[String](
name = "tsp.unifiedCacheDest",
default = "/srv#/prod/local/cache/topic_social_proof_unified",
help = "Wily path to topic social proof unified cache"
)
@Provides
@Singleton
def provideUnifiedCacheClient(
serviceIdentifier: ServiceIdentifier,
statsReceiver: StatsReceiver,
): Client =
MemcacheStore.memcachedClient(
name = ClientName("topic-social-proof-unified-memcache"),
dest = ZkEndPoint(tspUnifiedCacheDest()),
statsReceiver = statsReceiver.scope("cache_client"),
serviceIdentifier = serviceIdentifier
)
}

View File

@ -0,0 +1,41 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.twitter.escherbird.util.uttclient.CacheConfigV2
import com.twitter.escherbird.util.uttclient.CachedUttClientV2
import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2
import com.twitter.escherbird.utt.strato.thriftscala.Environment
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.inject.TwitterModule
import com.twitter.strato.client.Client
import com.twitter.topiclisting.clients.utt.UttClient
import javax.inject.Singleton
object UttClientModule extends TwitterModule {
@Provides
@Singleton
def providesUttClient(
stratoClient: Client,
statsReceiver: StatsReceiver
): UttClient = {
// Save 2 ^ 18 UTTs. Promising 100% cache rate
lazy val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143)
lazy val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2(
getTaxonomyConfig = defaultCacheConfigV2,
getUttTaxonomyConfig = defaultCacheConfigV2,
getLeafIds = defaultCacheConfigV2,
getLeafUttEntities = defaultCacheConfigV2
)
// CachedUttClient to use StratoClient
lazy val cachedUttClientV2: CachedUttClientV2 = new CachedUttClientV2(
stratoClient = stratoClient,
env = Environment.Prod,
cacheConfigs = uttClientCacheConfigsV2,
statsReceiver = statsReceiver.scope("CachedUttClient")
)
new UttClient(cachedUttClientV2, statsReceiver)
}
}

View File

@ -0,0 +1,27 @@
package com.twitter.tsp.modules
import com.google.inject.Provides
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.inject.TwitterModule
import com.twitter.topiclisting.TopicListing
import com.twitter.topiclisting.clients.utt.UttClient
import com.twitter.topiclisting.utt.UttLocalization
import com.twitter.topiclisting.utt.UttLocalizationImpl
import javax.inject.Singleton
object UttLocalizationModule extends TwitterModule {
@Provides
@Singleton
def providesUttLocalization(
topicListing: TopicListing,
uttClient: UttClient,
statsReceiver: StatsReceiver
): UttLocalization = {
new UttLocalizationImpl(
topicListing,
uttClient,
statsReceiver
)
}
}

View File

@ -0,0 +1,23 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"3rdparty/jvm/javax/inject:javax.inject",
"abdecider/src/main/scala",
"content-recommender/thrift/src/main/thrift:thrift-scala",
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/gizmoduck",
"src/scala/com/twitter/topic_recos/stores",
"src/thrift/com/twitter/gizmoduck:thrift-scala",
"src/thrift/com/twitter/gizmoduck:user-thrift-scala",
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
"stitch/stitch-storehaus",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/common",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/handlers",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/modules",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/stores",
"topic-social-proof/server/src/main/thrift:thrift-scala",
],
)

View File

@ -0,0 +1,182 @@
package com.twitter.tsp.service
import com.twitter.abdecider.ABDeciderFactory
import com.twitter.abdecider.LoggingABDecider
import com.twitter.tsp.thriftscala.TspTweetInfo
import com.twitter.discovery.common.configapi.FeatureContextBuilder
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.gizmoduck.thriftscala.LookupContext
import com.twitter.gizmoduck.thriftscala.QueryFields
import com.twitter.gizmoduck.thriftscala.User
import com.twitter.gizmoduck.thriftscala.UserService
import com.twitter.hermit.store.gizmoduck.GizmoduckUserStore
import com.twitter.logging.Logger
import com.twitter.simclusters_v2.common.SemanticCoreEntityId
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.common.UserId
import com.twitter.spam.rtf.thriftscala.SafetyLevel
import com.twitter.stitch.storehaus.StitchOfReadableStore
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.{Client => StratoClient}
import com.twitter.timelines.configapi
import com.twitter.timelines.configapi.CompositeConfig
import com.twitter.tsp.common.FeatureSwitchConfig
import com.twitter.tsp.common.FeatureSwitchesBuilder
import com.twitter.tsp.common.LoadShedder
import com.twitter.tsp.common.ParamsBuilder
import com.twitter.tsp.common.RecTargetFactory
import com.twitter.tsp.common.TopicSocialProofDecider
import com.twitter.tsp.handlers.TopicSocialProofHandler
import com.twitter.tsp.stores.LocalizedUttRecommendableTopicsStore
import com.twitter.tsp.stores.LocalizedUttTopicNameRequest
import com.twitter.tsp.stores.TopicResponses
import com.twitter.tsp.stores.TopicSocialProofStore
import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof
import com.twitter.tsp.stores.TopicStore
import com.twitter.tsp.stores.UttTopicFilterStore
import com.twitter.tsp.thriftscala.TopicSocialProofRequest
import com.twitter.tsp.thriftscala.TopicSocialProofResponse
import com.twitter.util.JavaTimer
import com.twitter.util.Timer
import javax.inject.Inject
import javax.inject.Singleton
import com.twitter.topiclisting.TopicListing
import com.twitter.topiclisting.utt.UttLocalization
@Singleton
class TopicSocialProofService @Inject() (
topicSocialProofStore: ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]],
tweetInfoStore: ReadableStore[TweetId, TspTweetInfo],
serviceIdentifier: ServiceIdentifier,
stratoClient: StratoClient,
gizmoduck: UserService.MethodPerEndpoint,
topicListing: TopicListing,
uttLocalization: UttLocalization,
decider: TopicSocialProofDecider,
loadShedder: LoadShedder,
stats: StatsReceiver) {
import TopicSocialProofService._
private val statsReceiver = stats.scope("topic-social-proof-management")
private val isProd: Boolean = serviceIdentifier.environment == "prod"
private val optOutStratoStorePath: String =
if (isProd) "interests/optOutInterests" else "interests/staging/optOutInterests"
private val notInterestedInStorePath: String =
if (isProd) "interests/notInterestedTopicsGetter"
else "interests/staging/notInterestedTopicsGetter"
private val userOptOutTopicsStore: ReadableStore[UserId, TopicResponses] =
TopicStore.userOptOutTopicStore(stratoClient, optOutStratoStorePath)(
statsReceiver.scope("ints_interests_opt_out_store"))
private val explicitFollowingTopicsStore: ReadableStore[UserId, TopicResponses] =
TopicStore.explicitFollowingTopicStore(stratoClient)(
statsReceiver.scope("ints_explicit_following_interests_store"))
private val userNotInterestedInTopicsStore: ReadableStore[UserId, TopicResponses] =
TopicStore.notInterestedInTopicsStore(stratoClient, notInterestedInStorePath)(
statsReceiver.scope("ints_not_interested_in_store"))
private lazy val localizedUttRecommendableTopicsStore: ReadableStore[
LocalizedUttTopicNameRequest,
Set[
SemanticCoreEntityId
]
] = new LocalizedUttRecommendableTopicsStore(uttLocalization)
implicit val timer: Timer = new JavaTimer(true)
private lazy val uttTopicFilterStore = new UttTopicFilterStore(
topicListing = topicListing,
userOptOutTopicsStore = userOptOutTopicsStore,
explicitFollowingTopicsStore = explicitFollowingTopicsStore,
notInterestedTopicsStore = userNotInterestedInTopicsStore,
localizedUttRecommendableTopicsStore = localizedUttRecommendableTopicsStore,
timer = timer,
stats = statsReceiver.scope("UttTopicFilterStore")
)
private lazy val scribeLogger: Option[Logger] = Some(Logger.get("client_event"))
private lazy val abDecider: LoggingABDecider =
ABDeciderFactory(
abDeciderYmlPath = configRepoDirectory + "/abdecider/abdecider.yml",
scribeLogger = scribeLogger,
decider = None,
environment = Some("production"),
).buildWithLogging()
private val builder: FeatureSwitchesBuilder = FeatureSwitchesBuilder(
statsReceiver = statsReceiver.scope("featureswitches-v2"),
abDecider = abDecider,
featuresDirectory = "features/topic-social-proof/main",
configRepoDirectory = configRepoDirectory,
addServiceDetailsFromAurora = !serviceIdentifier.isLocal,
fastRefresh = !isProd
)
private lazy val overridesConfig: configapi.Config = {
new CompositeConfig(
Seq(
FeatureSwitchConfig.config
)
)
}
private val featureContextBuilder: FeatureContextBuilder = FeatureContextBuilder(builder.build())
private val paramsBuilder: ParamsBuilder = ParamsBuilder(
featureContextBuilder,
abDecider,
overridesConfig,
statsReceiver.scope("params")
)
private val userStore: ReadableStore[UserId, User] = {
val queryFields: Set[QueryFields] = Set(
QueryFields.Profile,
QueryFields.Account,
QueryFields.Roles,
QueryFields.Discoverability,
QueryFields.Safety,
QueryFields.Takedowns
)
val context: LookupContext = LookupContext(safetyLevel = Some(SafetyLevel.Recommendations))
GizmoduckUserStore(
client = gizmoduck,
queryFields = queryFields,
context = context,
statsReceiver = statsReceiver.scope("gizmoduck")
)
}
private val recTargetFactory: RecTargetFactory = RecTargetFactory(
abDecider,
userStore,
paramsBuilder,
statsReceiver
)
private val topicSocialProofHandler =
new TopicSocialProofHandler(
topicSocialProofStore,
tweetInfoStore,
uttTopicFilterStore,
recTargetFactory,
decider,
statsReceiver.scope("TopicSocialProofHandler"),
loadShedder,
timer)
val topicSocialProofHandlerStoreStitch: TopicSocialProofRequest => com.twitter.stitch.Stitch[
TopicSocialProofResponse
] = StitchOfReadableStore(topicSocialProofHandler.toReadableStore)
}
object TopicSocialProofService {
private val configRepoDirectory = "/usr/local/config"
}

View File

@ -0,0 +1,32 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"3rdparty/jvm/com/twitter/storehaus:core",
"content-recommender/thrift/src/main/thrift:thrift-scala",
"escherbird/src/thrift/com/twitter/escherbird/topicannotation:topicannotation-thrift-scala",
"frigate/frigate-common:util",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/health",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/interests",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato",
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
"mediaservices/commons/src/main/thrift:thrift-scala",
"src/scala/com/twitter/simclusters_v2/common",
"src/scala/com/twitter/simclusters_v2/score",
"src/scala/com/twitter/topic_recos/common",
"src/scala/com/twitter/topic_recos/stores",
"src/thrift/com/twitter/frigate:frigate-common-thrift-scala",
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
"src/thrift/com/twitter/spam/rtf:safety-level-scala",
"src/thrift/com/twitter/tweetypie:service-scala",
"src/thrift/com/twitter/tweetypie:tweet-scala",
"stitch/stitch-storehaus",
"stitch/stitch-tweetypie/src/main/scala",
"strato/src/main/scala/com/twitter/strato/client",
"topic-social-proof/server/src/main/scala/com/twitter/tsp/utils",
"topic-social-proof/server/src/main/thrift:thrift-scala",
"topiclisting/topiclisting-core/src/main/scala/com/twitter/topiclisting",
],
)

View File

@ -0,0 +1,30 @@
package com.twitter.tsp.stores
import com.twitter.storehaus.ReadableStore
import com.twitter.topiclisting.FollowableTopicProductId
import com.twitter.topiclisting.ProductId
import com.twitter.topiclisting.SemanticCoreEntityId
import com.twitter.topiclisting.TopicListingViewerContext
import com.twitter.topiclisting.utt.UttLocalization
import com.twitter.util.Future
case class LocalizedUttTopicNameRequest(
productId: ProductId.Value,
viewerContext: TopicListingViewerContext,
enableInternationalTopics: Boolean)
class LocalizedUttRecommendableTopicsStore(uttLocalization: UttLocalization)
extends ReadableStore[LocalizedUttTopicNameRequest, Set[SemanticCoreEntityId]] {
override def get(
request: LocalizedUttTopicNameRequest
): Future[Option[Set[SemanticCoreEntityId]]] = {
uttLocalization
.getRecommendableTopics(
productId = request.productId,
viewerContext = request.viewerContext,
enableInternationalTopics = request.enableInternationalTopics,
followableTopicProductId = FollowableTopicProductId.AllFollowable
).map { response => Some(response) }
}
}

View File

@ -0,0 +1,31 @@
package com.twitter.tsp.stores
import com.twitter.contentrecommender.thriftscala.ScoringResponse
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.store.strato.StratoFetchableStore
import com.twitter.hermit.store.common.ObservedReadableStore
import com.twitter.simclusters_v2.thriftscala.Score
import com.twitter.simclusters_v2.thriftscala.ScoreId
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.Client
import com.twitter.strato.thrift.ScroogeConvImplicits._
import com.twitter.tsp.utils.ReadableStoreWithMapOptionValues
object RepresentationScorerStore {
def apply(
stratoClient: Client,
scoringColumnPath: String,
stats: StatsReceiver
): ReadableStore[ScoreId, Score] = {
val stratoFetchableStore = StratoFetchableStore
.withUnitView[ScoreId, ScoringResponse](stratoClient, scoringColumnPath)
val enrichedStore = new ReadableStoreWithMapOptionValues[ScoreId, ScoringResponse, Score](
stratoFetchableStore).mapOptionValues(_.score)
ObservedReadableStore(
enrichedStore
)(stats.scope("representation_scorer_store"))
}
}

View File

@ -0,0 +1,64 @@
package com.twitter.tsp.stores
import com.twitter.escherbird.topicannotation.strato.thriftscala.TopicAnnotationValue
import com.twitter.escherbird.topicannotation.strato.thriftscala.TopicAnnotationView
import com.twitter.frigate.common.store.strato.StratoFetchableStore
import com.twitter.simclusters_v2.common.TopicId
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.Client
import com.twitter.strato.thrift.ScroogeConvImplicits._
import com.twitter.util.Future
/**
* This is copied from `src/scala/com/twitter/topic_recos/stores/SemanticCoreAnnotationStore.scala`
* Unfortunately their version assumes (incorrectly) that there is no View which causes warnings.
* While these warnings may not cause any problems in practice, better safe than sorry.
*/
object SemanticCoreAnnotationStore {
private val column = "semanticCore/topicannotation/topicAnnotation.Tweet"
def getStratoStore(stratoClient: Client): ReadableStore[TweetId, TopicAnnotationValue] = {
StratoFetchableStore
.withView[TweetId, TopicAnnotationView, TopicAnnotationValue](
stratoClient,
column,
TopicAnnotationView())
}
case class TopicAnnotation(
topicId: TopicId,
ignoreSimClustersFilter: Boolean,
modelVersionId: Long)
}
/**
* Given a tweet Id, return the list of annotations defined by the TSIG team.
*/
case class SemanticCoreAnnotationStore(stratoStore: ReadableStore[TweetId, TopicAnnotationValue])
extends ReadableStore[TweetId, Seq[SemanticCoreAnnotationStore.TopicAnnotation]] {
import SemanticCoreAnnotationStore._
override def multiGet[K1 <: TweetId](
ks: Set[K1]
): Map[K1, Future[Option[Seq[TopicAnnotation]]]] = {
stratoStore
.multiGet(ks)
.mapValues(_.map(_.map { topicAnnotationValue =>
topicAnnotationValue.annotationsPerModel match {
case Some(annotationWithVersions) =>
annotationWithVersions.flatMap { annotations =>
annotations.annotations.map { annotation =>
TopicAnnotation(
annotation.entityId,
annotation.ignoreQualityFilter.getOrElse(false),
annotations.modelVersionId
)
}
}
case _ =>
Nil
}
}))
}
}

View File

@ -0,0 +1,127 @@
package com.twitter.tsp.stores
import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.util.StatsUtil
import com.twitter.simclusters_v2.thriftscala._
import com.twitter.storehaus.ReadableStore
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.tsp.stores.SemanticCoreAnnotationStore._
import com.twitter.tsp.stores.TopicSocialProofStore.TopicSocialProof
import com.twitter.util.Future
/**
* Provides a session-less Topic Social Proof information which doesn't rely on any User Info.
* This store is used by MemCache and In-Memory cache to achieve a higher performance.
* One Consumer embedding and Producer embedding are used to calculate raw score.
*/
case class TopicSocialProofStore(
representationScorerStore: ReadableStore[ScoreId, Score],
semanticCoreAnnotationStore: ReadableStore[TweetId, Seq[TopicAnnotation]]
)(
statsReceiver: StatsReceiver)
extends ReadableStore[TopicSocialProofStore.Query, Seq[TopicSocialProof]] {
import TopicSocialProofStore._
// Fetches the tweet's topic annotations from SemanticCore's Annotation API
override def get(query: TopicSocialProofStore.Query): Future[Option[Seq[TopicSocialProof]]] = {
StatsUtil.trackOptionStats(statsReceiver) {
for {
annotations <-
StatsUtil.trackItemsStats(statsReceiver.scope("semanticCoreAnnotationStore")) {
semanticCoreAnnotationStore.get(query.cacheableQuery.tweetId).map(_.getOrElse(Nil))
}
filteredAnnotations = filterAnnotationsByAllowList(annotations, query)
scoredTopics <-
StatsUtil.trackItemMapStats(statsReceiver.scope("scoreTopicTweetsTweetLanguage")) {
// de-dup identical topicIds
val uniqueTopicIds = filteredAnnotations.map { annotation =>
TopicId(annotation.topicId, Some(query.cacheableQuery.tweetLanguage), country = None)
}.toSet
if (query.cacheableQuery.enableCosineSimilarityScoreCalculation) {
scoreTopicTweets(query.cacheableQuery.tweetId, uniqueTopicIds)
} else {
Future.value(uniqueTopicIds.map(id => id -> Map.empty[ScoreKey, Double]).toMap)
}
}
} yield {
if (scoredTopics.nonEmpty) {
val versionedTopicProofs = filteredAnnotations.map { annotation =>
val topicId =
TopicId(annotation.topicId, Some(query.cacheableQuery.tweetLanguage), country = None)
TopicSocialProof(
topicId,
scores = scoredTopics.getOrElse(topicId, Map.empty),
annotation.ignoreSimClustersFilter,
annotation.modelVersionId
)
}
Some(versionedTopicProofs)
} else {
None
}
}
}
}
/***
* When the allowList is not empty (e.g., TSP handler call, CrTopic handler call),
* the filter will be enabled and we will only keep annotations that have versionIds existing
* in the input allowedSemanticCoreVersionIds set.
* But when the allowList is empty (e.g., some debugger calls),
* we will not filter anything and pass.
* We limit the number of versionIds to be K = MaxNumberVersionIds
*/
private def filterAnnotationsByAllowList(
annotations: Seq[TopicAnnotation],
query: TopicSocialProofStore.Query
): Seq[TopicAnnotation] = {
val trimmedVersionIds = query.allowedSemanticCoreVersionIds.take(MaxNumberVersionIds)
annotations.filter { annotation =>
trimmedVersionIds.isEmpty || trimmedVersionIds.contains(annotation.modelVersionId)
}
}
private def scoreTopicTweets(
tweetId: TweetId,
topicIds: Set[TopicId]
): Future[Map[TopicId, Map[ScoreKey, Double]]] = {
Future.collect {
topicIds.map { topicId =>
val scoresFut = TopicTweetsCosineSimilarityAggregateStore.getRawScoresMap(
topicId,
tweetId,
TopicTweetsCosineSimilarityAggregateStore.DefaultScoreKeys,
representationScorerStore
)
topicId -> scoresFut
}.toMap
}
}
}
object TopicSocialProofStore {
private val MaxNumberVersionIds = 9
case class Query(
cacheableQuery: CacheableQuery,
allowedSemanticCoreVersionIds: Set[Long] = Set.empty) // overridden by FS
case class CacheableQuery(
tweetId: TweetId,
tweetLanguage: String,
enableCosineSimilarityScoreCalculation: Boolean = true)
case class TopicSocialProof(
topicId: TopicId,
scores: Map[ScoreKey, Double],
ignoreSimClusterFiltering: Boolean,
semanticCoreVersionId: Long)
}

View File

@ -0,0 +1,135 @@
package com.twitter.tsp.stores
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.store.InterestedInInterestsFetchKey
import com.twitter.frigate.common.store.strato.StratoFetchableStore
import com.twitter.hermit.store.common.ObservedReadableStore
import com.twitter.interests.thriftscala.InterestId
import com.twitter.interests.thriftscala.InterestLabel
import com.twitter.interests.thriftscala.InterestRelationship
import com.twitter.interests.thriftscala.InterestRelationshipV1
import com.twitter.interests.thriftscala.InterestedInInterestLookupContext
import com.twitter.interests.thriftscala.InterestedInInterestModel
import com.twitter.interests.thriftscala.OptOutInterestLookupContext
import com.twitter.interests.thriftscala.UserInterest
import com.twitter.interests.thriftscala.UserInterestData
import com.twitter.interests.thriftscala.UserInterestsResponse
import com.twitter.simclusters_v2.common.UserId
import com.twitter.storehaus.ReadableStore
import com.twitter.strato.client.Client
import com.twitter.strato.thrift.ScroogeConvImplicits._
case class TopicResponse(
entityId: Long,
interestedInData: Seq[InterestedInInterestModel],
scoreOverride: Option[Double] = None,
notInterestedInTimestamp: Option[Long] = None,
topicFollowTimestamp: Option[Long] = None)
case class TopicResponses(responses: Seq[TopicResponse])
object TopicStore {
private val InterestedInInterestsColumn = "interests/interestedInInterests"
private lazy val ExplicitInterestsContext: InterestedInInterestLookupContext =
InterestedInInterestLookupContext(
explicitContext = None,
inferredContext = None,
disableImplicit = Some(true)
)
private def userInterestsResponseToTopicResponse(
userInterestsResponse: UserInterestsResponse
): TopicResponses = {
val responses = userInterestsResponse.interests.interests.toSeq.flatMap { userInterests =>
userInterests.collect {
case UserInterest(
InterestId.SemanticCore(semanticCoreEntity),
Some(UserInterestData.InterestedIn(data))) =>
val topicFollowingTimestampOpt = data.collect {
case InterestedInInterestModel.ExplicitModel(
InterestRelationship.V1(interestRelationshipV1)) =>
interestRelationshipV1.timestampMs
}.lastOption
TopicResponse(semanticCoreEntity.id, data, None, None, topicFollowingTimestampOpt)
}
}
TopicResponses(responses)
}
def explicitFollowingTopicStore(
stratoClient: Client
)(
implicit statsReceiver: StatsReceiver
): ReadableStore[UserId, TopicResponses] = {
val stratoStore =
StratoFetchableStore
.withUnitView[InterestedInInterestsFetchKey, UserInterestsResponse](
stratoClient,
InterestedInInterestsColumn)
.composeKeyMapping[UserId](uid =>
InterestedInInterestsFetchKey(
userId = uid,
labels = None,
lookupContext = Some(ExplicitInterestsContext)
))
.mapValues(userInterestsResponseToTopicResponse)
ObservedReadableStore(stratoStore)
}
def userOptOutTopicStore(
stratoClient: Client,
optOutStratoStorePath: String
)(
implicit statsReceiver: StatsReceiver
): ReadableStore[UserId, TopicResponses] = {
val stratoStore =
StratoFetchableStore
.withUnitView[
(Long, Option[Seq[InterestLabel]], Option[OptOutInterestLookupContext]),
UserInterestsResponse](stratoClient, optOutStratoStorePath)
.composeKeyMapping[UserId](uid => (uid, None, None))
.mapValues { userInterestsResponse =>
val responses = userInterestsResponse.interests.interests.toSeq.flatMap { userInterests =>
userInterests.collect {
case UserInterest(
InterestId.SemanticCore(semanticCoreEntity),
Some(UserInterestData.InterestedIn(data))) =>
TopicResponse(semanticCoreEntity.id, data, None)
}
}
TopicResponses(responses)
}
ObservedReadableStore(stratoStore)
}
def notInterestedInTopicsStore(
stratoClient: Client,
notInterestedInStorePath: String
)(
implicit statsReceiver: StatsReceiver
): ReadableStore[UserId, TopicResponses] = {
val stratoStore =
StratoFetchableStore
.withUnitView[Long, Seq[UserInterest]](stratoClient, notInterestedInStorePath)
.composeKeyMapping[UserId](identity)
.mapValues { notInterestedInInterests =>
val responses = notInterestedInInterests.collect {
case UserInterest(
InterestId.SemanticCore(semanticCoreEntity),
Some(UserInterestData.NotInterested(notInterestedInData))) =>
val notInterestedInTimestampOpt = notInterestedInData.collect {
case InterestRelationship.V1(interestRelationshipV1: InterestRelationshipV1) =>
interestRelationshipV1.timestampMs
}.lastOption
TopicResponse(semanticCoreEntity.id, Seq.empty, None, notInterestedInTimestampOpt)
}
TopicResponses(responses)
}
ObservedReadableStore(stratoStore)
}
}

View File

@ -0,0 +1,99 @@
package com.twitter.tsp.stores
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
import com.twitter.simclusters_v2.thriftscala.InternalId
import com.twitter.simclusters_v2.thriftscala.ModelVersion
import com.twitter.simclusters_v2.thriftscala.ScoreInternalId
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
import com.twitter.simclusters_v2.thriftscala.{
SimClustersEmbeddingPairScoreId => ThriftSimClustersEmbeddingPairScoreId
}
import com.twitter.simclusters_v2.thriftscala.TopicId
import com.twitter.simclusters_v2.thriftscala.{Score => ThriftScore}
import com.twitter.simclusters_v2.thriftscala.{ScoreId => ThriftScoreId}
import com.twitter.storehaus.ReadableStore
import com.twitter.topic_recos.common._
import com.twitter.topic_recos.common.Configs.DefaultModelVersion
import com.twitter.tsp.stores.TopicTweetsCosineSimilarityAggregateStore.ScoreKey
import com.twitter.util.Future
object TopicTweetsCosineSimilarityAggregateStore {
val TopicEmbeddingTypes: Seq[EmbeddingType] =
Seq(
EmbeddingType.FavTfgTopic,
EmbeddingType.LogFavBasedKgoApeTopic
)
// Add the new embedding types if want to test the new Tweet embedding performance.
val TweetEmbeddingTypes: Seq[EmbeddingType] = Seq(EmbeddingType.LogFavBasedTweet)
val ModelVersions: Seq[ModelVersion] =
Seq(DefaultModelVersion)
val DefaultScoreKeys: Seq[ScoreKey] = {
for {
modelVersion <- ModelVersions
topicEmbeddingType <- TopicEmbeddingTypes
tweetEmbeddingType <- TweetEmbeddingTypes
} yield {
ScoreKey(
topicEmbeddingType = topicEmbeddingType,
tweetEmbeddingType = tweetEmbeddingType,
modelVersion = modelVersion
)
}
}
case class ScoreKey(
topicEmbeddingType: EmbeddingType,
tweetEmbeddingType: EmbeddingType,
modelVersion: ModelVersion)
def getRawScoresMap(
topicId: TopicId,
tweetId: TweetId,
scoreKeys: Seq[ScoreKey],
representationScorerStore: ReadableStore[ThriftScoreId, ThriftScore]
): Future[Map[ScoreKey, Double]] = {
val scoresMapFut = scoreKeys.map { key =>
val scoreInternalId = ScoreInternalId.SimClustersEmbeddingPairScoreId(
ThriftSimClustersEmbeddingPairScoreId(
buildTopicEmbedding(topicId, key.topicEmbeddingType, key.modelVersion),
SimClustersEmbeddingId(
key.tweetEmbeddingType,
key.modelVersion,
InternalId.TweetId(tweetId))
))
val scoreFut = representationScorerStore
.get(
ThriftScoreId(
algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, // Hard code as cosine sim
internalId = scoreInternalId
))
key -> scoreFut
}.toMap
Future
.collect(scoresMapFut).map(_.collect {
case (key, Some(ThriftScore(score))) =>
(key, score)
})
}
}
case class TopicTweetsCosineSimilarityAggregateStore(
representationScorerStore: ReadableStore[ThriftScoreId, ThriftScore]
)(
statsReceiver: StatsReceiver)
extends ReadableStore[(TopicId, TweetId, Seq[ScoreKey]), Map[ScoreKey, Double]] {
import TopicTweetsCosineSimilarityAggregateStore._
override def get(k: (TopicId, TweetId, Seq[ScoreKey])): Future[Option[Map[ScoreKey, Double]]] = {
statsReceiver.counter("topicTweetsCosineSimilariltyAggregateStore").incr()
getRawScoresMap(k._1, k._2, k._3, representationScorerStore).map(Some(_))
}
}

View File

@ -0,0 +1,230 @@
package com.twitter.tsp.stores
import com.twitter.conversions.DurationOps._
import com.twitter.tsp.thriftscala.TspTweetInfo
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.thriftscala.TweetHealthScores
import com.twitter.frigate.thriftscala.UserAgathaScores
import com.twitter.logging.Logger
import com.twitter.mediaservices.commons.thriftscala.MediaCategory
import com.twitter.mediaservices.commons.tweetmedia.thriftscala.MediaInfo
import com.twitter.mediaservices.commons.tweetmedia.thriftscala.MediaSizeType
import com.twitter.simclusters_v2.common.TweetId
import com.twitter.simclusters_v2.common.UserId
import com.twitter.spam.rtf.thriftscala.SafetyLevel
import com.twitter.stitch.Stitch
import com.twitter.stitch.storehaus.ReadableStoreOfStitch
import com.twitter.stitch.tweetypie.TweetyPie
import com.twitter.stitch.tweetypie.TweetyPie.TweetyPieException
import com.twitter.storehaus.ReadableStore
import com.twitter.topiclisting.AnnotationRuleProvider
import com.twitter.tsp.utils.HealthSignalsUtils
import com.twitter.tweetypie.thriftscala.TweetInclude
import com.twitter.tweetypie.thriftscala.{Tweet => TTweet}
import com.twitter.tweetypie.thriftscala._
import com.twitter.util.Duration
import com.twitter.util.Future
import com.twitter.util.TimeoutException
import com.twitter.util.Timer
object TweetyPieFieldsStore {
// Tweet fields options. Only fields specified here will be hydrated in the tweet
private val CoreTweetFields: Set[TweetInclude] = Set[TweetInclude](
TweetInclude.TweetFieldId(TTweet.IdField.id),
TweetInclude.TweetFieldId(TTweet.CoreDataField.id), // needed for the authorId
TweetInclude.TweetFieldId(TTweet.LanguageField.id),
TweetInclude.CountsFieldId(StatusCounts.FavoriteCountField.id),
TweetInclude.CountsFieldId(StatusCounts.RetweetCountField.id),
TweetInclude.TweetFieldId(TTweet.QuotedTweetField.id),
TweetInclude.TweetFieldId(TTweet.MediaKeysField.id),
TweetInclude.TweetFieldId(TTweet.EscherbirdEntityAnnotationsField.id),
TweetInclude.TweetFieldId(TTweet.MediaField.id),
TweetInclude.TweetFieldId(TTweet.UrlsField.id)
)
private val gtfo: GetTweetFieldsOptions = GetTweetFieldsOptions(
tweetIncludes = CoreTweetFields,
safetyLevel = Some(SafetyLevel.Recommendations)
)
def getStoreFromTweetyPie(
tweetyPie: TweetyPie,
convertExceptionsToNotFound: Boolean = true
): ReadableStore[Long, GetTweetFieldsResult] = {
val log = Logger("TweetyPieFieldsStore")
ReadableStoreOfStitch { tweetId: Long =>
tweetyPie
.getTweetFields(tweetId, options = gtfo)
.rescue {
case ex: TweetyPieException if convertExceptionsToNotFound =>
log.error(ex, s"Error while hitting tweetypie ${ex.result}")
Stitch.NotFound
}
}
}
}
object TweetInfoStore {
case class IsPassTweetHealthFilters(tweetStrictest: Option[Boolean])
case class IsPassAgathaHealthFilters(agathaStrictest: Option[Boolean])
private val HealthStoreTimeout: Duration = 40.milliseconds
private val isPassTweetHealthFilters: IsPassTweetHealthFilters = IsPassTweetHealthFilters(None)
private val isPassAgathaHealthFilters: IsPassAgathaHealthFilters = IsPassAgathaHealthFilters(None)
}
case class TweetInfoStore(
tweetFieldsStore: ReadableStore[TweetId, GetTweetFieldsResult],
tweetHealthModelStore: ReadableStore[TweetId, TweetHealthScores],
userHealthModelStore: ReadableStore[UserId, UserAgathaScores],
timer: Timer
)(
statsReceiver: StatsReceiver)
extends ReadableStore[TweetId, TspTweetInfo] {
import TweetInfoStore._
private[this] def toTweetInfo(
tweetFieldsResult: GetTweetFieldsResult
): Future[Option[TspTweetInfo]] = {
tweetFieldsResult.tweetResult match {
case result: TweetFieldsResultState.Found if result.found.suppressReason.isEmpty =>
val tweet = result.found.tweet
val authorIdOpt = tweet.coreData.map(_.userId)
val favCountOpt = tweet.counts.flatMap(_.favoriteCount)
val languageOpt = tweet.language.map(_.language)
val hasImageOpt =
tweet.mediaKeys.map(_.map(_.mediaCategory).exists(_ == MediaCategory.TweetImage))
val hasGifOpt =
tweet.mediaKeys.map(_.map(_.mediaCategory).exists(_ == MediaCategory.TweetGif))
val isNsfwAuthorOpt = Some(
tweet.coreData.exists(_.nsfwUser) || tweet.coreData.exists(_.nsfwAdmin))
val isTweetReplyOpt = tweet.coreData.map(_.reply.isDefined)
val hasMultipleMediaOpt =
tweet.mediaKeys.map(_.map(_.mediaCategory).size > 1)
val isKGODenylist = Some(
tweet.escherbirdEntityAnnotations
.exists(_.entityAnnotations.exists(AnnotationRuleProvider.isSuppressedTopicsDenylist)))
val isNullcastOpt = tweet.coreData.map(_.nullcast) // These are Ads. go/nullcast
val videoDurationOpt = tweet.media.flatMap(_.flatMap {
_.mediaInfo match {
case Some(MediaInfo.VideoInfo(info)) =>
Some((info.durationMillis + 999) / 1000) // video playtime always round up
case _ => None
}
}.headOption)
// There many different types of videos. To be robust to new types being added, we just use
// the videoDurationOpt to keep track of whether the item has a video or not.
val hasVideo = videoDurationOpt.isDefined
val mediaDimensionsOpt =
tweet.media.flatMap(_.headOption.flatMap(
_.sizes.find(_.sizeType == MediaSizeType.Orig).map(size => (size.width, size.height))))
val mediaWidth = mediaDimensionsOpt.map(_._1).getOrElse(1)
val mediaHeight = mediaDimensionsOpt.map(_._2).getOrElse(1)
// high resolution media's width is always greater than 480px and height is always greater than 480px
val isHighMediaResolution = mediaHeight > 480 && mediaWidth > 480
val isVerticalAspectRatio = mediaHeight >= mediaWidth && mediaWidth > 1
val hasUrlOpt = tweet.urls.map(_.nonEmpty)
(authorIdOpt, favCountOpt) match {
case (Some(authorId), Some(favCount)) =>
hydrateHealthScores(tweet.id, authorId).map {
case (isPassAgathaHealthFilters, isPassTweetHealthFilters) =>
Some(
TspTweetInfo(
authorId = authorId,
favCount = favCount,
language = languageOpt,
hasImage = hasImageOpt,
hasVideo = Some(hasVideo),
hasGif = hasGifOpt,
isNsfwAuthor = isNsfwAuthorOpt,
isKGODenylist = isKGODenylist,
isNullcast = isNullcastOpt,
videoDurationSeconds = videoDurationOpt,
isHighMediaResolution = Some(isHighMediaResolution),
isVerticalAspectRatio = Some(isVerticalAspectRatio),
isPassAgathaHealthFilterStrictest = isPassAgathaHealthFilters.agathaStrictest,
isPassTweetHealthFilterStrictest = isPassTweetHealthFilters.tweetStrictest,
isReply = isTweetReplyOpt,
hasMultipleMedia = hasMultipleMediaOpt,
hasUrl = hasUrlOpt
))
}
case _ =>
statsReceiver.counter("missingFields").incr()
Future.None // These values should always exist.
}
case _: TweetFieldsResultState.NotFound =>
statsReceiver.counter("notFound").incr()
Future.None
case _: TweetFieldsResultState.Failed =>
statsReceiver.counter("failed").incr()
Future.None
case _: TweetFieldsResultState.Filtered =>
statsReceiver.counter("filtered").incr()
Future.None
case _ =>
statsReceiver.counter("unknown").incr()
Future.None
}
}
private[this] def hydrateHealthScores(
tweetId: TweetId,
authorId: Long
): Future[(IsPassAgathaHealthFilters, IsPassTweetHealthFilters)] = {
Future
.join(
tweetHealthModelStore
.multiGet(Set(tweetId))(tweetId),
userHealthModelStore
.multiGet(Set(authorId))(authorId)
).map {
case (tweetHealthScoresOpt, userAgathaScoresOpt) =>
// This stats help us understand empty rate for AgathaCalibratedNsfw / NsfwTextUserScore
statsReceiver.counter("totalCountAgathaScore").incr()
if (userAgathaScoresOpt.getOrElse(UserAgathaScores()).agathaCalibratedNsfw.isEmpty)
statsReceiver.counter("emptyCountAgathaCalibratedNsfw").incr()
if (userAgathaScoresOpt.getOrElse(UserAgathaScores()).nsfwTextUserScore.isEmpty)
statsReceiver.counter("emptyCountNsfwTextUserScore").incr()
val isPassAgathaHealthFilters = IsPassAgathaHealthFilters(
agathaStrictest =
Some(HealthSignalsUtils.isTweetAgathaModelQualified(userAgathaScoresOpt)),
)
val isPassTweetHealthFilters = IsPassTweetHealthFilters(
tweetStrictest =
Some(HealthSignalsUtils.isTweetHealthModelQualified(tweetHealthScoresOpt))
)
(isPassAgathaHealthFilters, isPassTweetHealthFilters)
}.raiseWithin(HealthStoreTimeout)(timer).rescue {
case _: TimeoutException =>
statsReceiver.counter("hydrateHealthScoreTimeout").incr()
Future.value((isPassAgathaHealthFilters, isPassTweetHealthFilters))
case _ =>
statsReceiver.counter("hydrateHealthScoreFailure").incr()
Future.value((isPassAgathaHealthFilters, isPassTweetHealthFilters))
}
}
override def multiGet[K1 <: TweetId](ks: Set[K1]): Map[K1, Future[Option[TspTweetInfo]]] = {
statsReceiver.counter("tweetFieldsStore").incr(ks.size)
tweetFieldsStore
.multiGet(ks).mapValues(_.flatMap { _.map { v => toTweetInfo(v) }.getOrElse(Future.None) })
}
}

View File

@ -0,0 +1,248 @@
package com.twitter.tsp.stores
import com.twitter.conversions.DurationOps._
import com.twitter.finagle.FailureFlags.flagsOf
import com.twitter.finagle.mux.ClientDiscardedRequestException
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.frigate.common.store.interests
import com.twitter.simclusters_v2.common.UserId
import com.twitter.storehaus.ReadableStore
import com.twitter.topiclisting.ProductId
import com.twitter.topiclisting.TopicListing
import com.twitter.topiclisting.TopicListingViewerContext
import com.twitter.topiclisting.{SemanticCoreEntityId => ScEntityId}
import com.twitter.tsp.thriftscala.TopicFollowType
import com.twitter.tsp.thriftscala.TopicListingSetting
import com.twitter.tsp.thriftscala.TopicSocialProofFilteringBypassMode
import com.twitter.util.Duration
import com.twitter.util.Future
import com.twitter.util.TimeoutException
import com.twitter.util.Timer
class UttTopicFilterStore(
topicListing: TopicListing,
userOptOutTopicsStore: ReadableStore[interests.UserId, TopicResponses],
explicitFollowingTopicsStore: ReadableStore[interests.UserId, TopicResponses],
notInterestedTopicsStore: ReadableStore[interests.UserId, TopicResponses],
localizedUttRecommendableTopicsStore: ReadableStore[LocalizedUttTopicNameRequest, Set[Long]],
timer: Timer,
stats: StatsReceiver) {
import UttTopicFilterStore._
// Set of blacklisted SemanticCore IDs that are paused.
private[this] def getPausedTopics(topicCtx: TopicListingViewerContext): Set[ScEntityId] = {
topicListing.getPausedTopics(topicCtx)
}
private[this] def getOptOutTopics(userId: Long): Future[Set[ScEntityId]] = {
stats.counter("getOptOutTopicsCount").incr()
userOptOutTopicsStore
.get(userId).map { responseOpt =>
responseOpt
.map { responses => responses.responses.map(_.entityId) }.getOrElse(Seq.empty).toSet
}.raiseWithin(DefaultOptOutTimeout)(timer).rescue {
case err: TimeoutException =>
stats.counter("getOptOutTopicsTimeout").incr()
Future.exception(err)
case err: ClientDiscardedRequestException
if flagsOf(err).contains("interrupted") && flagsOf(err)
.contains("ignorable") =>
stats.counter("getOptOutTopicsDiscardedBackupRequest").incr()
Future.exception(err)
case err =>
stats.counter("getOptOutTopicsFailure").incr()
Future.exception(err)
}
}
private[this] def getNotInterestedIn(userId: Long): Future[Set[ScEntityId]] = {
stats.counter("getNotInterestedInCount").incr()
notInterestedTopicsStore
.get(userId).map { responseOpt =>
responseOpt
.map { responses => responses.responses.map(_.entityId) }.getOrElse(Seq.empty).toSet
}.raiseWithin(DefaultNotInterestedInTimeout)(timer).rescue {
case err: TimeoutException =>
stats.counter("getNotInterestedInTimeout").incr()
Future.exception(err)
case err: ClientDiscardedRequestException
if flagsOf(err).contains("interrupted") && flagsOf(err)
.contains("ignorable") =>
stats.counter("getNotInterestedInDiscardedBackupRequest").incr()
Future.exception(err)
case err =>
stats.counter("getNotInterestedInFailure").incr()
Future.exception(err)
}
}
private[this] def getFollowedTopics(userId: Long): Future[Set[TopicResponse]] = {
stats.counter("getFollowedTopicsCount").incr()
explicitFollowingTopicsStore
.get(userId).map { responseOpt =>
responseOpt.map(_.responses.toSet).getOrElse(Set.empty)
}.raiseWithin(DefaultInterestedInTimeout)(timer).rescue {
case _: TimeoutException =>
stats.counter("getFollowedTopicsTimeout").incr()
Future(Set.empty)
case _ =>
stats.counter("getFollowedTopicsFailure").incr()
Future(Set.empty)
}
}
private[this] def getFollowedTopicIds(userId: Long): Future[Set[ScEntityId]] = {
getFollowedTopics(userId: Long).map(_.map(_.entityId))
}
private[this] def getWhitelistTopicIds(
normalizedContext: TopicListingViewerContext,
enableInternationalTopics: Boolean
): Future[Set[ScEntityId]] = {
stats.counter("getWhitelistTopicIdsCount").incr()
val uttRequest = LocalizedUttTopicNameRequest(
productId = ProductId.Followable,
viewerContext = normalizedContext,
enableInternationalTopics = enableInternationalTopics
)
localizedUttRecommendableTopicsStore
.get(uttRequest).map { response =>
response.getOrElse(Set.empty)
}.rescue {
case _ =>
stats.counter("getWhitelistTopicIdsFailure").incr()
Future(Set.empty)
}
}
private[this] def getDenyListTopicIdsForUser(
userId: UserId,
topicListingSetting: TopicListingSetting,
context: TopicListingViewerContext,
bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]]
): Future[Set[ScEntityId]] = {
val denyListTopicIdsFuture = topicListingSetting match {
case TopicListingSetting.ImplicitFollow =>
getFollowedTopicIds(userId)
case _ =>
Future(Set.empty[ScEntityId])
}
// we don't filter opt-out topics for implicit follow topic listing setting
val optOutTopicIdsFuture = topicListingSetting match {
case TopicListingSetting.ImplicitFollow => Future(Set.empty[ScEntityId])
case _ => getOptOutTopics(userId)
}
val notInterestedTopicIdsFuture =
if (bypassModes.exists(_.contains(TopicSocialProofFilteringBypassMode.NotInterested))) {
Future(Set.empty[ScEntityId])
} else {
getNotInterestedIn(userId)
}
val pausedTopicIdsFuture = Future.value(getPausedTopics(context))
Future
.collect(
List(
denyListTopicIdsFuture,
optOutTopicIdsFuture,
notInterestedTopicIdsFuture,
pausedTopicIdsFuture)).map { list => list.reduce(_ ++ _) }
}
private[this] def getDiff(
aFut: Future[Set[ScEntityId]],
bFut: Future[Set[ScEntityId]]
): Future[Set[ScEntityId]] = {
Future.join(aFut, bFut).map {
case (a, b) => a.diff(b)
}
}
/**
* calculates the diff of all the whitelisted IDs with blacklisted IDs and returns the set of IDs
* that we will be recommending from or followed topics by the user by client setting.
*/
def getAllowListTopicsForUser(
userId: UserId,
topicListingSetting: TopicListingSetting,
context: TopicListingViewerContext,
bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]]
): Future[Map[ScEntityId, Option[TopicFollowType]]] = {
/**
* Title: an illustrative table to explain how allow list is composed
* AllowList = WhiteList - DenyList - OptOutTopics - PausedTopics - NotInterestedInTopics
*
* TopicListingSetting: Following ImplicitFollow All Followable
* Whitelist: FollowedTopics(user) AllWhitelistedTopics Nil AllWhitelistedTopics
* DenyList: Nil FollowedTopics(user) Nil Nil
*
* ps. for TopicListingSetting.All, the returned allow list is Nil. Why?
* It's because that allowList is not required given the TopicListingSetting == 'All'.
* See TopicSocialProofHandler.filterByAllowedList() for more details.
*/
topicListingSetting match {
// "All" means all the UTT entity is qualified. So don't need to fetch the Whitelist anymore.
case TopicListingSetting.All => Future.value(Map.empty)
case TopicListingSetting.Following =>
getFollowingTopicsForUserWithTimestamp(userId, context, bypassModes).map {
_.mapValues(_ => Some(TopicFollowType.Following))
}
case TopicListingSetting.ImplicitFollow =>
getDiff(
getWhitelistTopicIds(context, enableInternationalTopics = true),
getDenyListTopicIdsForUser(userId, topicListingSetting, context, bypassModes)).map {
_.map { scEntityId =>
scEntityId -> Some(TopicFollowType.ImplicitFollow)
}.toMap
}
case _ =>
val followedTopicIdsFut = getFollowedTopicIds(userId)
val allowListTopicIdsFut = getDiff(
getWhitelistTopicIds(context, enableInternationalTopics = true),
getDenyListTopicIdsForUser(userId, topicListingSetting, context, bypassModes))
Future.join(allowListTopicIdsFut, followedTopicIdsFut).map {
case (allowListTopicId, followedTopicIds) =>
allowListTopicId.map { scEntityId =>
if (followedTopicIds.contains(scEntityId))
scEntityId -> Some(TopicFollowType.Following)
else scEntityId -> Some(TopicFollowType.ImplicitFollow)
}.toMap
}
}
}
private[this] def getFollowingTopicsForUserWithTimestamp(
userId: UserId,
context: TopicListingViewerContext,
bypassModes: Option[Set[TopicSocialProofFilteringBypassMode]]
): Future[Map[ScEntityId, Option[Long]]] = {
val followedTopicIdToTimestampFut = getFollowedTopics(userId).map(_.map { followedTopic =>
followedTopic.entityId -> followedTopic.topicFollowTimestamp
}.toMap)
followedTopicIdToTimestampFut.flatMap { followedTopicIdToTimestamp =>
getDiff(
Future(followedTopicIdToTimestamp.keySet),
getDenyListTopicIdsForUser(userId, TopicListingSetting.Following, context, bypassModes)
).map {
_.map { scEntityId =>
scEntityId -> followedTopicIdToTimestamp.get(scEntityId).flatten
}.toMap
}
}
}
}
object UttTopicFilterStore {
val DefaultNotInterestedInTimeout: Duration = 60.milliseconds
val DefaultOptOutTimeout: Duration = 60.milliseconds
val DefaultInterestedInTimeout: Duration = 60.milliseconds
}

View File

@ -0,0 +1,14 @@
scala_library(
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
],
dependencies = [
"3rdparty/jvm/org/lz4:lz4-java",
"content-recommender/thrift/src/main/thrift:thrift-scala",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store",
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/health",
"stitch/stitch-storehaus",
"topic-social-proof/server/src/main/thrift:thrift-scala",
],
)

View File

@ -0,0 +1,19 @@
package com.twitter.tsp.utils
import com.twitter.bijection.Injection
import scala.util.Try
import net.jpountz.lz4.LZ4CompressorWithLength
import net.jpountz.lz4.LZ4DecompressorWithLength
import net.jpountz.lz4.LZ4Factory
object LZ4Injection extends Injection[Array[Byte], Array[Byte]] {
private val lz4Factory = LZ4Factory.fastestInstance()
private val fastCompressor = new LZ4CompressorWithLength(lz4Factory.fastCompressor())
private val decompressor = new LZ4DecompressorWithLength(lz4Factory.fastDecompressor())
override def apply(a: Array[Byte]): Array[Byte] = LZ4Injection.fastCompressor.compress(a)
override def invert(b: Array[Byte]): Try[Array[Byte]] = Try {
LZ4Injection.decompressor.decompress(b)
}
}

View File

@ -0,0 +1,20 @@
package com.twitter.tsp.utils
import com.twitter.storehaus.AbstractReadableStore
import com.twitter.storehaus.ReadableStore
import com.twitter.util.Future
class ReadableStoreWithMapOptionValues[K, V1, V2](rs: ReadableStore[K, V1]) {
def mapOptionValues(
fn: V1 => Option[V2]
): ReadableStore[K, V2] = {
val self = rs
new AbstractReadableStore[K, V2] {
override def get(k: K): Future[Option[V2]] = self.get(k).map(_.flatMap(fn))
override def multiGet[K1 <: K](ks: Set[K1]): Map[K1, Future[Option[V2]]] =
self.multiGet(ks).mapValues(_.map(_.flatMap(fn)))
}
}
}

View File

@ -0,0 +1,32 @@
package com.twitter.tsp.utils
import com.twitter.bijection.Injection
import java.io.ByteArrayInputStream
import java.io.ByteArrayOutputStream
import java.io.ObjectInputStream
import java.io.ObjectOutputStream
import java.io.Serializable
import scala.util.Try
/**
* @tparam T must be a serializable class
*/
case class SeqObjectInjection[T <: Serializable]() extends Injection[Seq[T], Array[Byte]] {
override def apply(seq: Seq[T]): Array[Byte] = {
val byteStream = new ByteArrayOutputStream()
val outputStream = new ObjectOutputStream(byteStream)
outputStream.writeObject(seq)
outputStream.close()
byteStream.toByteArray
}
override def invert(bytes: Array[Byte]): Try[Seq[T]] = {
Try {
val inputStream = new ObjectInputStream(new ByteArrayInputStream(bytes))
val seq = inputStream.readObject().asInstanceOf[Seq[T]]
inputStream.close()
seq
}
}
}

View File

@ -0,0 +1,21 @@
create_thrift_libraries(
base_name = "thrift",
sources = ["*.thrift"],
platform = "java8",
tags = [
"bazel-compatible",
],
dependency_roots = [
"content-recommender/thrift/src/main/thrift",
"content-recommender/thrift/src/main/thrift:content-recommender-common",
"interests-service/thrift/src/main/thrift",
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift",
],
generate_languages = [
"java",
"scala",
"strato",
],
provides_java_name = "tsp-thrift-java",
provides_scala_name = "tsp-thrift-scala",
)

View File

@ -0,0 +1,104 @@
namespace java com.twitter.tsp.thriftjava
namespace py gen.twitter.tsp
#@namespace scala com.twitter.tsp.thriftscala
#@namespace strato com.twitter.tsp.strato
include "com/twitter/contentrecommender/common.thrift"
include "com/twitter/simclusters_v2/identifier.thrift"
include "com/twitter/simclusters_v2/online_store.thrift"
include "topic_listing.thrift"
enum TopicListingSetting {
All = 0 // All the existing Semantic Core Entity/Topics. ie., All topics on twitter, and may or may not have been launched yet.
Followable = 1 // All the topics which the user is allowed to follow. ie., topics that have shipped, and user may or may not be following it.
Following = 2 // Only topics the user is explicitly following
ImplicitFollow = 3 // The topics user has not followed but implicitly may follow. ie., Only topics that user has not followed.
} (hasPersonalData='false')
// used to tell Topic Social Proof endpoint which specific filtering can be bypassed
enum TopicSocialProofFilteringBypassMode {
NotInterested = 0
} (hasPersonalData='false')
struct TopicSocialProofRequest {
1: required i64 userId(personalDataType = "UserId")
2: required set<i64> tweetIds(personalDataType = 'TweetId')
3: required common.DisplayLocation displayLocation
4: required TopicListingSetting topicListingSetting
5: required topic_listing.TopicListingViewerContext context
6: optional set<TopicSocialProofFilteringBypassMode> bypassModes
7: optional map<i64, set<MetricTag>> tags
}
struct TopicSocialProofOptions {
1: required i64 userId(personalDataType = "UserId")
2: required common.DisplayLocation displayLocation
3: required TopicListingSetting topicListingSetting
4: required topic_listing.TopicListingViewerContext context
5: optional set<TopicSocialProofFilteringBypassMode> bypassModes
6: optional map<i64, set<MetricTag>> tags
}
struct TopicSocialProofResponse {
1: required map<i64, list<TopicWithScore>> socialProofs
}(hasPersonalData='false')
// Distinguishes between how a topic tweet is generated. Useful for metric tracking and debugging
enum TopicTweetType {
// CrOON candidates
UserInterestedIn = 1
Twistly = 2
// crTopic candidates
SkitConsumerEmbeddings = 100
SkitProducerEmbeddings = 101
SkitHighPrecision = 102
SkitInterestBrowser = 103
Certo = 104
}(persisted='true')
struct TopicWithScore {
1: required i64 topicId
2: required double score // score used to rank topics relative to one another
3: optional TopicTweetType algorithmType // how the topic is generated
4: optional TopicFollowType topicFollowType // Whether the topic is being explicitly or implicily followed
}(persisted='true', hasPersonalData='false')
struct ScoreKey {
1: required identifier.EmbeddingType userEmbeddingType
2: required identifier.EmbeddingType topicEmbeddingType
3: required online_store.ModelVersion modelVersion
}(persisted='true', hasPersonalData='false')
struct UserTopicScore {
1: required map<ScoreKey, double> scores
}(persisted='true', hasPersonalData='false')
enum TopicFollowType {
Following = 1
ImplicitFollow = 2
}(persisted='true')
// Provide the Tags which provides the Recommended Tweets Source Signal and other context.
// Warning: Please don't use this tag in any ML Features or business logic.
enum MetricTag {
// Source Signal Tags
TweetFavorite = 0
Retweet = 1
UserFollow = 101
PushOpenOrNtabClick = 201
HomeTweetClick = 301
HomeVideoView = 302
HomeSongbirdShowMore = 303
InterestsRankerRecentSearches = 401 // For Interests Candidate Expansion
UserInterestedIn = 501
MBCG = 503
// Other Metric Tags
} (persisted='true', hasPersonalData='true')

View File

@ -0,0 +1,26 @@
namespace java com.twitter.tsp.thriftjava
namespace py gen.twitter.tsp
#@namespace scala com.twitter.tsp.thriftscala
#@namespace strato com.twitter.tsp.strato
struct TspTweetInfo {
1: required i64 authorId
2: required i64 favCount
3: optional string language
6: optional bool hasImage
7: optional bool hasVideo
8: optional bool hasGif
9: optional bool isNsfwAuthor
10: optional bool isKGODenylist
11: optional bool isNullcast
// available if the tweet contains video
12: optional i32 videoDurationSeconds
13: optional bool isHighMediaResolution
14: optional bool isVerticalAspectRatio
// health signal scores
15: optional bool isPassAgathaHealthFilterStrictest
16: optional bool isPassTweetHealthFilterStrictest
17: optional bool isReply
18: optional bool hasMultipleMedia
23: optional bool hasUrl
}(persisted='false', hasPersonalData='true')

View File

@ -3,8 +3,8 @@ Trust and Safety Models
We decided to open source the training code of the following models:
- pNSFWMedia: Model to detect tweets with NSFW images. This includes adult and porn content.
- pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics
- pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter terms of service
- pAbuse: Model to detect abusive content. This includes violations of Twitter terms of service, including hate speech, targeted harassment and abusive behavior.
- pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics.
- pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter's terms of service.
- pAbuse: Model to detect abusive content. This includes violations of Twitter's terms of service, including hate speech, targeted harassment and abusive behavior.
We have several more models and rules that we are not going to open source at this time because of the adversarial nature of this area. The team is considering open sourcing more models going forward and will keep the community posted accordingly.

View File

@ -1,7 +1,7 @@
# TWML
---
Note: `twml` is no longer under development. Much of the code here is not out of date and unused.
Note: `twml` is no longer under development. Much of the code here is out of date and unused.
It is included here for completeness, because `twml` is still used to train the light ranker models
(see `src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md`)
---
@ -10,4 +10,4 @@ TWML is one of Twitter's machine learning frameworks, which uses Tensorflow unde
deprecated,
it is still currently used to train the Earlybird light ranking models (
see `src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/train.py`).
The most relevant part of this is the `DataRecordTrainer` class, which is where the core training logic resides.
The most relevant part of this is the `DataRecordTrainer` class, which is where the core training logic resides.

4
unified_user_actions/.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
.DS_Store
CONFIG.ini
PROJECT
docs

View File

@ -0,0 +1 @@
# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD

View File

@ -0,0 +1,10 @@
# Unified User Actions (UUA)
**Unified User Actions** (UUA) is a centralized, real-time stream of user actions on Twitter, consumed by various product, ML, and marketing teams. UUA reads client-side and server-side event streams that contain the user's actions and generates a unified real-time user actions Kafka stream. The Kafka stream is replicated to HDFS, GCP Pubsub, GCP GCS, GCP BigQuery. The user actions include public actions such as favorites, retweets, replies and implicit actions like bookmark, impression, video view.
## Components
- adapter: transform the raw inputs to UUA Thrift output
- client: Kafka client related utils
- kafka: more specific Kafka utils like customized serde
- service: deployment, modules and services

View File

@ -0,0 +1,19 @@
package com.twitter.unified_user_actions.adapter
import com.twitter.finagle.stats.NullStatsReceiver
import com.twitter.finagle.stats.StatsReceiver
trait AbstractAdapter[INPUT, OUTK, OUTV] extends Serializable {
/**
* The basic input -> seq[output] adapter which concrete adapters should extend from
* @param input a single INPUT
* @return A list of (OUTK, OUTV) tuple. The OUTK is the output key mainly for publishing to Kafka (or Pubsub).
* If other processing, e.g. offline batch processing, doesn't require the output key then it can drop it
* like source.adaptOneToKeyedMany.map(_._2)
*/
def adaptOneToKeyedMany(
input: INPUT,
statsReceiver: StatsReceiver = NullStatsReceiver
): Seq[(OUTK, OUTV)]
}

View File

@ -0,0 +1,11 @@
scala_library(
name = "base",
sources = [
"AbstractAdapter.scala",
],
compiler_option_sets = ["fatal_warnings"],
tags = ["bazel-compatible"],
dependencies = [
"util/util-stats/src/main/scala/com/twitter/finagle/stats",
],
)

View File

@ -0,0 +1,125 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.thriftscala._
object AdsCallbackEngagement {
object PromotedTweetFav extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetFav)
object PromotedTweetUnfav extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetUnfav)
object PromotedTweetReply extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetReply)
object PromotedTweetRetweet
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetRetweet)
object PromotedTweetBlockAuthor
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetBlockAuthor)
object PromotedTweetUnblockAuthor
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetUnblockAuthor)
object PromotedTweetComposeTweet
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetComposeTweet)
object PromotedTweetClick extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClick)
object PromotedTweetReport extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetReport)
object PromotedProfileFollow
extends ProfileAdsCallbackEngagement(ActionType.ServerPromotedProfileFollow)
object PromotedProfileUnfollow
extends ProfileAdsCallbackEngagement(ActionType.ServerPromotedProfileUnfollow)
object PromotedTweetMuteAuthor
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetMuteAuthor)
object PromotedTweetClickProfile
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClickProfile)
object PromotedTweetClickHashtag
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetClickHashtag)
object PromotedTweetOpenLink
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetOpenLink) {
override def getItem(input: SpendServerEvent): Option[Item] = {
input.engagementEvent.flatMap { e =>
e.impressionData.flatMap { i =>
getPromotedTweetInfo(
i.promotedTweetId,
i.advertiserId,
tweetActionInfoOpt = Some(
TweetActionInfo.ServerPromotedTweetOpenLink(
ServerPromotedTweetOpenLink(url = e.url))))
}
}
}
}
object PromotedTweetCarouselSwipeNext
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetCarouselSwipeNext)
object PromotedTweetCarouselSwipePrevious
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetCarouselSwipePrevious)
object PromotedTweetLingerImpressionShort
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionShort)
object PromotedTweetLingerImpressionMedium
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionMedium)
object PromotedTweetLingerImpressionLong
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetLingerImpressionLong)
object PromotedTweetClickSpotlight
extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTweetClickSpotlight)
object PromotedTweetViewSpotlight
extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTweetViewSpotlight)
object PromotedTrendView
extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTrendView)
object PromotedTrendClick
extends BaseTrendAdsCallbackEngagement(ActionType.ServerPromotedTrendClick)
object PromotedTweetVideoPlayback25
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback25)
object PromotedTweetVideoPlayback50
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback50)
object PromotedTweetVideoPlayback75
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoPlayback75)
object PromotedTweetVideoAdPlayback25
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback25)
object PromotedTweetVideoAdPlayback50
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback50)
object PromotedTweetVideoAdPlayback75
extends BaseVideoAdsCallbackEngagement(ActionType.ServerPromotedTweetVideoAdPlayback75)
object TweetVideoAdPlayback25
extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback25)
object TweetVideoAdPlayback50
extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback50)
object TweetVideoAdPlayback75
extends BaseVideoAdsCallbackEngagement(ActionType.ServerTweetVideoAdPlayback75)
object PromotedTweetDismissWithoutReason
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissWithoutReason)
object PromotedTweetDismissUninteresting
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissUninteresting)
object PromotedTweetDismissRepetitive
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissRepetitive)
object PromotedTweetDismissSpam
extends BaseAdsCallbackEngagement(ActionType.ServerPromotedTweetDismissSpam)
}

View File

@ -0,0 +1,28 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.finagle.stats.NullStatsReceiver
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.finatra.kafka.serde.UnKeyed
import com.twitter.unified_user_actions.adapter.AbstractAdapter
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
class AdsCallbackEngagementsAdapter
extends AbstractAdapter[SpendServerEvent, UnKeyed, UnifiedUserAction] {
import AdsCallbackEngagementsAdapter._
override def adaptOneToKeyedMany(
input: SpendServerEvent,
statsReceiver: StatsReceiver = NullStatsReceiver
): Seq[(UnKeyed, UnifiedUserAction)] =
adaptEvent(input).map { e => (UnKeyed, e) }
}
object AdsCallbackEngagementsAdapter {
def adaptEvent(input: SpendServerEvent): Seq[UnifiedUserAction] = {
val baseEngagements: Seq[BaseAdsCallbackEngagement] =
EngagementTypeMappings.getEngagementMappings(Option(input).flatMap(_.engagementEvent))
baseEngagements.flatMap(_.getUUA(input))
}
}

View File

@ -0,0 +1,18 @@
scala_library(
sources = [
"*.scala",
],
compiler_option_sets = ["fatal_warnings"],
tags = [
"bazel-compatible",
"bazel-only",
],
dependencies = [
"kafka/finagle-kafka/finatra-kafka/src/main/scala",
"src/thrift/com/twitter/ads/billing/spendserver:spendserver_thrift-scala",
"src/thrift/com/twitter/ads/eventstream:eventstream-scala",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common",
"unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala",
],
)

View File

@ -0,0 +1,68 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.adapter.common.AdapterUtils
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.AuthorInfo
import com.twitter.unified_user_actions.thriftscala.EventMetadata
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.SourceLineage
import com.twitter.unified_user_actions.thriftscala.TweetInfo
import com.twitter.unified_user_actions.thriftscala.TweetActionInfo
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
import com.twitter.unified_user_actions.thriftscala.UserIdentifier
abstract class BaseAdsCallbackEngagement(actionType: ActionType) {
protected def getItem(input: SpendServerEvent): Option[Item] = {
input.engagementEvent.flatMap { e =>
e.impressionData.flatMap { i =>
getPromotedTweetInfo(i.promotedTweetId, i.advertiserId)
}
}
}
protected def getPromotedTweetInfo(
promotedTweetIdOpt: Option[Long],
advertiserId: Long,
tweetActionInfoOpt: Option[TweetActionInfo] = None
): Option[Item] = {
promotedTweetIdOpt.map { promotedTweetId =>
Item.TweetInfo(
TweetInfo(
actionTweetId = promotedTweetId,
actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(advertiserId))),
tweetActionInfo = tweetActionInfoOpt)
)
}
}
def getUUA(input: SpendServerEvent): Option[UnifiedUserAction] = {
val userIdentifier: UserIdentifier =
UserIdentifier(
userId = input.engagementEvent.flatMap(e => e.clientInfo.flatMap(_.userId64)),
guestIdMarketing = input.engagementEvent.flatMap(e => e.clientInfo.flatMap(_.guestId)),
)
getItem(input).map { item =>
UnifiedUserAction(
userIdentifier = userIdentifier,
item = item,
actionType = actionType,
eventMetadata = getEventMetadata(input),
)
}
}
protected def getEventMetadata(input: SpendServerEvent): EventMetadata =
EventMetadata(
sourceTimestampMs = input.engagementEvent
.map { e => e.engagementEpochTimeMilliSec }.getOrElse(AdapterUtils.currentTimestampMs),
receivedTimestampMs = AdapterUtils.currentTimestampMs,
sourceLineage = SourceLineage.ServerAdsCallbackEngagements,
language = input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.languageCode) },
countryCode = input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.countryCode) },
clientAppId =
input.engagementEvent.flatMap { e => e.clientInfo.flatMap(_.clientId) }.map { _.toLong },
)
}

View File

@ -0,0 +1,18 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.thriftscala._
abstract class BaseTrendAdsCallbackEngagement(actionType: ActionType)
extends BaseAdsCallbackEngagement(actionType = actionType) {
override protected def getItem(input: SpendServerEvent): Option[Item] = {
input.engagementEvent.flatMap { e =>
e.impressionData.flatMap { i =>
i.promotedTrendId.map { promotedTrendId =>
Item.TrendInfo(TrendInfo(actionTrendId = promotedTrendId))
}
}
}
}
}

View File

@ -0,0 +1,54 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.AuthorInfo
import com.twitter.unified_user_actions.thriftscala.TweetVideoWatch
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.TweetActionInfo
import com.twitter.unified_user_actions.thriftscala.TweetInfo
abstract class BaseVideoAdsCallbackEngagement(actionType: ActionType)
extends BaseAdsCallbackEngagement(actionType = actionType) {
override def getItem(input: SpendServerEvent): Option[Item] = {
input.engagementEvent.flatMap { e =>
e.impressionData.flatMap { i =>
getTweetInfo(i.promotedTweetId, i.organicTweetId, i.advertiserId, input)
}
}
}
private def getTweetInfo(
promotedTweetId: Option[Long],
organicTweetId: Option[Long],
advertiserId: Long,
input: SpendServerEvent
): Option[Item] = {
val actionedTweetIdOpt: Option[Long] =
if (promotedTweetId.isEmpty) organicTweetId else promotedTweetId
actionedTweetIdOpt.map { actionTweetId =>
Item.TweetInfo(
TweetInfo(
actionTweetId = actionTweetId,
actionTweetAuthorInfo = Some(AuthorInfo(authorId = Some(advertiserId))),
tweetActionInfo = Some(
TweetActionInfo.TweetVideoWatch(
TweetVideoWatch(
isMonetizable = Some(true),
videoOwnerId = input.engagementEvent
.flatMap(e => e.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.videoOwnerId),
videoUuid = input.engagementEvent
.flatMap(_.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.videoUuid),
prerollOwnerId = input.engagementEvent
.flatMap(e => e.cardEngagement).flatMap(_.amplifyDetails).flatMap(
_.prerollOwnerId),
prerollUuid = input.engagementEvent
.flatMap(_.cardEngagement).flatMap(_.amplifyDetails).flatMap(_.prerollUuid)
))
)
),
)
}
}
}

View File

@ -0,0 +1,69 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.eventstream.thriftscala.EngagementEvent
import com.twitter.adserver.thriftscala.EngagementType
import com.twitter.unified_user_actions.adapter.ads_callback_engagements.AdsCallbackEngagement._
object EngagementTypeMappings {
/**
* Ads could be Tweets or non-Tweets. Since UUA explicitly sets the item type, it is
* possible that one Ads Callback engagement type maps to multiple UUA action types.
*/
def getEngagementMappings(
engagementEvent: Option[EngagementEvent]
): Seq[BaseAdsCallbackEngagement] = {
val promotedTweetId: Option[Long] =
engagementEvent.flatMap(_.impressionData).flatMap(_.promotedTweetId)
engagementEvent
.map(event =>
event.engagementType match {
case EngagementType.Fav => Seq(PromotedTweetFav)
case EngagementType.Unfav => Seq(PromotedTweetUnfav)
case EngagementType.Reply => Seq(PromotedTweetReply)
case EngagementType.Retweet => Seq(PromotedTweetRetweet)
case EngagementType.Block => Seq(PromotedTweetBlockAuthor)
case EngagementType.Unblock => Seq(PromotedTweetUnblockAuthor)
case EngagementType.Send => Seq(PromotedTweetComposeTweet)
case EngagementType.Detail => Seq(PromotedTweetClick)
case EngagementType.Report => Seq(PromotedTweetReport)
case EngagementType.Follow => Seq(PromotedProfileFollow)
case EngagementType.Unfollow => Seq(PromotedProfileUnfollow)
case EngagementType.Mute => Seq(PromotedTweetMuteAuthor)
case EngagementType.ProfilePic => Seq(PromotedTweetClickProfile)
case EngagementType.ScreenName => Seq(PromotedTweetClickProfile)
case EngagementType.UserName => Seq(PromotedTweetClickProfile)
case EngagementType.Hashtag => Seq(PromotedTweetClickHashtag)
case EngagementType.Url => Seq(PromotedTweetOpenLink)
case EngagementType.CarouselSwipeNext => Seq(PromotedTweetCarouselSwipeNext)
case EngagementType.CarouselSwipePrevious => Seq(PromotedTweetCarouselSwipePrevious)
case EngagementType.DwellShort => Seq(PromotedTweetLingerImpressionShort)
case EngagementType.DwellMedium => Seq(PromotedTweetLingerImpressionMedium)
case EngagementType.DwellLong => Seq(PromotedTweetLingerImpressionLong)
case EngagementType.SpotlightClick => Seq(PromotedTweetClickSpotlight)
case EngagementType.SpotlightView => Seq(PromotedTweetViewSpotlight)
case EngagementType.TrendView => Seq(PromotedTrendView)
case EngagementType.TrendClick => Seq(PromotedTrendClick)
case EngagementType.VideoContentPlayback25 => Seq(PromotedTweetVideoPlayback25)
case EngagementType.VideoContentPlayback50 => Seq(PromotedTweetVideoPlayback50)
case EngagementType.VideoContentPlayback75 => Seq(PromotedTweetVideoPlayback75)
case EngagementType.VideoAdPlayback25 if promotedTweetId.isDefined =>
Seq(PromotedTweetVideoAdPlayback25)
case EngagementType.VideoAdPlayback25 if promotedTweetId.isEmpty =>
Seq(TweetVideoAdPlayback25)
case EngagementType.VideoAdPlayback50 if promotedTweetId.isDefined =>
Seq(PromotedTweetVideoAdPlayback50)
case EngagementType.VideoAdPlayback50 if promotedTweetId.isEmpty =>
Seq(TweetVideoAdPlayback50)
case EngagementType.VideoAdPlayback75 if promotedTweetId.isDefined =>
Seq(PromotedTweetVideoAdPlayback75)
case EngagementType.VideoAdPlayback75 if promotedTweetId.isEmpty =>
Seq(TweetVideoAdPlayback75)
case EngagementType.DismissRepetitive => Seq(PromotedTweetDismissRepetitive)
case EngagementType.DismissSpam => Seq(PromotedTweetDismissSpam)
case EngagementType.DismissUninteresting => Seq(PromotedTweetDismissUninteresting)
case EngagementType.DismissWithoutReason => Seq(PromotedTweetDismissWithoutReason)
case _ => Nil
}).toSeq.flatten
}
}

View File

@ -0,0 +1,26 @@
package com.twitter.unified_user_actions.adapter.ads_callback_engagements
import com.twitter.ads.spendserver.thriftscala.SpendServerEvent
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.ProfileInfo
abstract class ProfileAdsCallbackEngagement(actionType: ActionType)
extends BaseAdsCallbackEngagement(actionType) {
override protected def getItem(input: SpendServerEvent): Option[Item] = {
input.engagementEvent.flatMap { e =>
e.impressionData.flatMap { i =>
getProfileInfo(i.advertiserId)
}
}
}
protected def getProfileInfo(advertiserId: Long): Option[Item] = {
Some(
Item.ProfileInfo(
ProfileInfo(
actionProfileId = advertiserId
)))
}
}

View File

@ -0,0 +1,13 @@
scala_library(
sources = [
"*.scala",
],
tags = ["bazel-compatible"],
dependencies = [
"client-events/thrift/src/thrift/storage/twitter/behavioral_event:behavioral_event-scala",
"kafka/finagle-kafka/finatra-kafka/src/main/scala",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common",
"unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala",
],
)

View File

@ -0,0 +1,96 @@
package com.twitter.unified_user_actions.adapter.behavioral_client_event
import com.twitter.client_event_entities.serverside_context_key.latest.thriftscala.FlattenedServersideContextKey
import com.twitter.storage.behavioral_event.thriftscala.EventLogContext
import com.twitter.storage.behavioral_event.thriftscala.FlattenedEventLog
import com.twitter.unified_user_actions.adapter.common.AdapterUtils
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.BreadcrumbTweet
import com.twitter.unified_user_actions.thriftscala.ClientEventNamespace
import com.twitter.unified_user_actions.thriftscala.EventMetadata
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.ProductSurface
import com.twitter.unified_user_actions.thriftscala.ProductSurfaceInfo
import com.twitter.unified_user_actions.thriftscala.SourceLineage
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
import com.twitter.unified_user_actions.thriftscala.UserIdentifier
case class ProductSurfaceRelated(
productSurface: Option[ProductSurface],
productSurfaceInfo: Option[ProductSurfaceInfo])
trait BaseBCEAdapter {
def toUUA(e: FlattenedEventLog): Seq[UnifiedUserAction]
protected def getUserIdentifier(c: EventLogContext): UserIdentifier =
UserIdentifier(
userId = c.userId,
guestIdMarketing = c.guestIdMarketing
)
protected def getEventMetadata(e: FlattenedEventLog): EventMetadata =
EventMetadata(
sourceLineage = SourceLineage.BehavioralClientEvents,
sourceTimestampMs =
e.context.driftAdjustedEventCreatedAtMs.getOrElse(e.context.eventCreatedAtMs),
receivedTimestampMs = AdapterUtils.currentTimestampMs,
// Client UI language or from Gizmoduck which is what user set in Twitter App.
// Please see more at https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/LanguageIdentifier.scala
// The format should be ISO 639-1.
language = e.context.languageCode.map(AdapterUtils.normalizeLanguageCode),
// Country code could be IP address (geoduck) or User registration country (gizmoduck) and the former takes precedence.
// We dont know exactly which one is applied, unfortunately,
// see https://sourcegraph.twitter.biz/git.twitter.biz/source/-/blob/finatra-internal/international/src/main/scala/com/twitter/finatra/international/CountryIdentifier.scala
// The format should be ISO_3166-1_alpha-2.
countryCode = e.context.countryCode.map(AdapterUtils.normalizeCountryCode),
clientAppId = e.context.clientApplicationId,
clientVersion = e.context.clientVersion,
clientPlatform = e.context.clientPlatform,
viewHierarchy = e.v1ViewTypeHierarchy,
clientEventNamespace = Some(
ClientEventNamespace(
page = e.page,
section = e.section,
element = e.element,
action = e.actionName,
subsection = e.subsection
)),
breadcrumbViews = e.v1BreadcrumbViewTypeHierarchy,
breadcrumbTweets = e.v1BreadcrumbTweetIds.map { breadcrumbs =>
breadcrumbs.map { breadcrumb =>
BreadcrumbTweet(
tweetId = breadcrumb.serversideContextId.toLong,
sourceComponent = breadcrumb.sourceComponent)
}
}
)
protected def getBreadcrumbTweetIds(
breadcrumbTweetIds: Option[Seq[FlattenedServersideContextKey]]
): Seq[BreadcrumbTweet] =
breadcrumbTweetIds
.getOrElse(Nil).map(breadcrumb => {
BreadcrumbTweet(
tweetId = breadcrumb.serversideContextId.toLong,
sourceComponent = breadcrumb.sourceComponent)
})
protected def getBreadcrumbViews(breadcrumbView: Option[Seq[String]]): Seq[String] =
breadcrumbView.getOrElse(Nil)
protected def getUnifiedUserAction(
event: FlattenedEventLog,
actionType: ActionType,
item: Item,
productSurface: Option[ProductSurface] = None,
productSurfaceInfo: Option[ProductSurfaceInfo] = None
): UnifiedUserAction =
UnifiedUserAction(
userIdentifier = getUserIdentifier(event.context),
actionType = actionType,
item = item,
eventMetadata = getEventMetadata(event),
productSurface = productSurface,
productSurfaceInfo = productSurfaceInfo
)
}

View File

@ -0,0 +1,39 @@
package com.twitter.unified_user_actions.adapter.behavioral_client_event
import com.twitter.finagle.stats.NullStatsReceiver
import com.twitter.finagle.stats.StatsReceiver
import com.twitter.finatra.kafka.serde.UnKeyed
import com.twitter.storage.behavioral_event.thriftscala.FlattenedEventLog
import com.twitter.unified_user_actions.adapter.AbstractAdapter
import com.twitter.unified_user_actions.thriftscala._
class BehavioralClientEventAdapter
extends AbstractAdapter[FlattenedEventLog, UnKeyed, UnifiedUserAction] {
import BehavioralClientEventAdapter._
override def adaptOneToKeyedMany(
input: FlattenedEventLog,
statsReceiver: StatsReceiver = NullStatsReceiver
): Seq[(UnKeyed, UnifiedUserAction)] =
adaptEvent(input).map { e => (UnKeyed, e) }
}
object BehavioralClientEventAdapter {
def adaptEvent(e: FlattenedEventLog): Seq[UnifiedUserAction] =
// See go/bcecoverage for event namespaces, usage and coverage details
Option(e)
.map { e =>
(e.page, e.actionName) match {
case (Some("tweet_details"), Some("impress")) =>
TweetImpressionBCEAdapter.TweetDetails.toUUA(e)
case (Some("fullscreen_video"), Some("impress")) =>
TweetImpressionBCEAdapter.FullscreenVideo.toUUA(e)
case (Some("fullscreen_image"), Some("impress")) =>
TweetImpressionBCEAdapter.FullscreenImage.toUUA(e)
case (Some("profile"), Some("impress")) =>
ProfileImpressionBCEAdapter.Profile.toUUA(e)
case _ => Nil
}
}.getOrElse(Nil)
}

View File

@ -0,0 +1,34 @@
package com.twitter.unified_user_actions.adapter.behavioral_client_event
import com.twitter.client.behavioral_event.action.impress.latest.thriftscala.Impress
import com.twitter.client_event_entities.serverside_context_key.latest.thriftscala.FlattenedServersideContextKey
import com.twitter.unified_user_actions.thriftscala.Item
trait ImpressionBCEAdapter extends BaseBCEAdapter {
type ImpressedItem <: Item
def getImpressedItem(
context: FlattenedServersideContextKey,
impression: Impress
): ImpressedItem
/**
* The start time of an impression in milliseconds since epoch. In BCE, the impression
* tracking clock will start immediately after the page is visible with no initial delay.
*/
def getImpressedStartTimestamp(impression: Impress): Long =
impression.visibilityPctDwellStartMs
/**
* The end time of an impression in milliseconds since epoch. In BCE, the impression
* tracking clock will end before the user exit the page.
*/
def getImpressedEndTimestamp(impression: Impress): Long =
impression.visibilityPctDwellEndMs
/**
* The UI component that hosted the impressed item.
*/
def getImpressedUISourceComponent(context: FlattenedServersideContextKey): String =
context.sourceComponent
}

View File

@ -0,0 +1,52 @@
package com.twitter.unified_user_actions.adapter.behavioral_client_event
import com.twitter.client.behavioral_event.action.impress.latest.thriftscala.Impress
import com.twitter.client_event_entities.serverside_context_key.latest.thriftscala.FlattenedServersideContextKey
import com.twitter.storage.behavioral_event.thriftscala.FlattenedEventLog
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.ClientProfileV2Impression
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.ProductSurface
import com.twitter.unified_user_actions.thriftscala.ProfileActionInfo
import com.twitter.unified_user_actions.thriftscala.ProfileInfo
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
object ProfileImpressionBCEAdapter {
val Profile = new ProfileImpressionBCEAdapter()
}
class ProfileImpressionBCEAdapter extends ImpressionBCEAdapter {
override type ImpressedItem = Item.ProfileInfo
override def toUUA(e: FlattenedEventLog): Seq[UnifiedUserAction] =
(e.v2Impress, e.v1UserIds) match {
case (Some(v2Impress), Some(v1UserIds)) =>
v1UserIds.map { user =>
getUnifiedUserAction(
event = e,
actionType = ActionType.ClientProfileV2Impression,
item = getImpressedItem(user, v2Impress),
productSurface = Some(ProductSurface.ProfilePage)
)
}
case _ => Nil
}
override def getImpressedItem(
context: FlattenedServersideContextKey,
impression: Impress
): ImpressedItem =
Item.ProfileInfo(
ProfileInfo(
actionProfileId = context.serversideContextId.toLong,
profileActionInfo = Some(
ProfileActionInfo.ClientProfileV2Impression(
ClientProfileV2Impression(
impressStartTimestampMs = getImpressedStartTimestamp(impression),
impressEndTimestampMs = getImpressedEndTimestamp(impression),
sourceComponent = getImpressedUISourceComponent(context)
)
)
)
))
}

View File

@ -0,0 +1,84 @@
package com.twitter.unified_user_actions.adapter.behavioral_client_event
import com.twitter.client.behavioral_event.action.impress.latest.thriftscala.Impress
import com.twitter.client_event_entities.serverside_context_key.latest.thriftscala.FlattenedServersideContextKey
import com.twitter.storage.behavioral_event.thriftscala.FlattenedEventLog
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.ClientTweetV2Impression
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.ProductSurface
import com.twitter.unified_user_actions.thriftscala.TweetActionInfo
import com.twitter.unified_user_actions.thriftscala.TweetInfo
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
object TweetImpressionBCEAdapter {
val TweetDetails = new TweetImpressionBCEAdapter(ActionType.ClientTweetV2Impression)
val FullscreenVideo = new TweetImpressionBCEAdapter(
ActionType.ClientTweetVideoFullscreenV2Impression)
val FullscreenImage = new TweetImpressionBCEAdapter(
ActionType.ClientTweetImageFullscreenV2Impression)
}
class TweetImpressionBCEAdapter(actionType: ActionType) extends ImpressionBCEAdapter {
override type ImpressedItem = Item.TweetInfo
override def toUUA(e: FlattenedEventLog): Seq[UnifiedUserAction] =
(actionType, e.v2Impress, e.v1TweetIds, e.v1BreadcrumbTweetIds) match {
case (ActionType.ClientTweetV2Impression, Some(v2Impress), Some(v1TweetIds), _) =>
toUUAEvents(e, v2Impress, v1TweetIds)
case (
ActionType.ClientTweetVideoFullscreenV2Impression,
Some(v2Impress),
_,
Some(v1BreadcrumbTweetIds)) =>
toUUAEvents(e, v2Impress, v1BreadcrumbTweetIds)
case (
ActionType.ClientTweetImageFullscreenV2Impression,
Some(v2Impress),
_,
Some(v1BreadcrumbTweetIds)) =>
toUUAEvents(e, v2Impress, v1BreadcrumbTweetIds)
case _ => Nil
}
private def toUUAEvents(
e: FlattenedEventLog,
v2Impress: Impress,
v1TweetIds: Seq[FlattenedServersideContextKey]
): Seq[UnifiedUserAction] =
v1TweetIds.map { tweet =>
getUnifiedUserAction(
event = e,
actionType = actionType,
item = getImpressedItem(tweet, v2Impress),
productSurface = getProductSurfaceRelated.productSurface,
productSurfaceInfo = getProductSurfaceRelated.productSurfaceInfo
)
}
override def getImpressedItem(
context: FlattenedServersideContextKey,
impression: Impress
): ImpressedItem =
Item.TweetInfo(
TweetInfo(
actionTweetId = context.serversideContextId.toLong,
tweetActionInfo = Some(
TweetActionInfo.ClientTweetV2Impression(
ClientTweetV2Impression(
impressStartTimestampMs = getImpressedStartTimestamp(impression),
impressEndTimestampMs = getImpressedEndTimestamp(impression),
sourceComponent = getImpressedUISourceComponent(context)
)
))
))
private def getProductSurfaceRelated: ProductSurfaceRelated =
actionType match {
case ActionType.ClientTweetV2Impression =>
ProductSurfaceRelated(
productSurface = Some(ProductSurface.TweetDetailsPage),
productSurfaceInfo = None)
case _ => ProductSurfaceRelated(productSurface = None, productSurfaceInfo = None)
}
}

View File

@ -0,0 +1,16 @@
scala_library(
sources = [
"*.scala",
],
tags = ["bazel-compatible"],
dependencies = [
"common-internal/analytics/client-analytics-data-layer/src/main/scala",
"kafka/finagle-kafka/finatra-kafka/src/main/scala",
"src/scala/com/twitter/loggedout/analytics/common",
"src/thrift/com/twitter/clientapp/gen:clientapp-scala",
"twadoop_config/configuration/log_categories/group/scribelib:client_event-scala",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter:base",
"unified_user_actions/adapter/src/main/scala/com/twitter/unified_user_actions/adapter/common",
"unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala",
],
)

View File

@ -0,0 +1,46 @@
package com.twitter.unified_user_actions.adapter.client_event
import com.twitter.clientapp.thriftscala.LogEvent
import com.twitter.logbase.thriftscala.LogBase
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.Item
import com.twitter.unified_user_actions.thriftscala.UnifiedUserAction
import com.twitter.unified_user_actions.thriftscala._
import com.twitter.clientapp.thriftscala.{Item => LogEventItem}
abstract class BaseCTAClientEvent(actionType: ActionType)
extends BaseClientEvent(actionType = actionType) {
override def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = {
val logBase: Option[LogBase] = logEvent.logBase
val userIdentifier: UserIdentifier = UserIdentifier(
userId = logBase.flatMap(_.userId),
guestIdMarketing = logBase.flatMap(_.guestIdMarketing))
val uuaItem: Item = Item.CtaInfo(CTAInfo())
val eventTimestamp = logBase.flatMap(getSourceTimestamp).getOrElse(0L)
val ceItem = LogEventItem.unsafeEmpty
val productSurface: Option[ProductSurface] = ProductSurfaceUtils
.getProductSurface(logEvent.eventNamespace)
val eventMetaData: EventMetadata = ClientEventCommonUtils
.getEventMetadata(
eventTimestamp = eventTimestamp,
logEvent = logEvent,
ceItem = ceItem,
productSurface = productSurface
)
Seq(
UnifiedUserAction(
userIdentifier = userIdentifier,
item = uuaItem,
actionType = actionType,
eventMetadata = eventMetaData,
productSurface = productSurface,
productSurfaceInfo =
ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent)
))
}
}

View File

@ -0,0 +1,26 @@
package com.twitter.unified_user_actions.adapter.client_event
import com.twitter.clientapp.thriftscala.LogEvent
import com.twitter.clientapp.thriftscala.{Item => LogEventItem}
import com.twitter.clientapp.thriftscala.ItemType
import com.twitter.unified_user_actions.thriftscala.ActionType
import com.twitter.unified_user_actions.thriftscala.CardInfo
import com.twitter.unified_user_actions.thriftscala.Item
abstract class BaseCardClientEvent(actionType: ActionType)
extends BaseClientEvent(actionType = actionType) {
override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean =
ItemTypeFilterPredicates.ignoreItemType(itemTypeOpt)
override def getUuaItem(
ceItem: LogEventItem,
logEvent: LogEvent
): Option[Item] = Some(
Item.CardInfo(
CardInfo(
id = ceItem.id,
itemType = ceItem.itemType,
actionTweetAuthorInfo = ClientEventCommonUtils.getAuthorInfo(ceItem),
))
)
}

View File

@ -0,0 +1,68 @@
package com.twitter.unified_user_actions.adapter.client_event
import com.twitter.clientapp.thriftscala.ItemType
import com.twitter.clientapp.thriftscala.LogEvent
import com.twitter.clientapp.thriftscala.{Item => LogEventItem}
import com.twitter.logbase.thriftscala.ClientEventReceiver
import com.twitter.logbase.thriftscala.LogBase
import com.twitter.unified_user_actions.thriftscala._
abstract class BaseClientEvent(actionType: ActionType) {
def toUnifiedUserAction(logEvent: LogEvent): Seq[UnifiedUserAction] = {
val logBase: Option[LogBase] = logEvent.logBase
for {
ed <- logEvent.eventDetails.toSeq
items <- ed.items.toSeq
ceItem <- items
eventTimestamp <- logBase.flatMap(getSourceTimestamp)
uuaItem <- getUuaItem(ceItem, logEvent)
if isItemTypeValid(ceItem.itemType)
} yield {
val userIdentifier: UserIdentifier = UserIdentifier(
userId = logBase.flatMap(_.userId),
guestIdMarketing = logBase.flatMap(_.guestIdMarketing))
val productSurface: Option[ProductSurface] = ProductSurfaceUtils
.getProductSurface(logEvent.eventNamespace)
val eventMetaData: EventMetadata = ClientEventCommonUtils
.getEventMetadata(
eventTimestamp = eventTimestamp,
logEvent = logEvent,
ceItem = ceItem,
productSurface = productSurface
)
UnifiedUserAction(
userIdentifier = userIdentifier,
item = uuaItem,
actionType = actionType,
eventMetadata = eventMetaData,
productSurface = productSurface,
productSurfaceInfo =
ProductSurfaceUtils.getProductSurfaceInfo(productSurface, ceItem, logEvent)
)
}
}
def getUuaItem(
ceItem: LogEventItem,
logEvent: LogEvent
): Option[Item] = for (actionTweetId <- ceItem.id)
yield Item.TweetInfo(
ClientEventCommonUtils
.getBasicTweetInfo(actionTweetId, ceItem, logEvent.eventNamespace))
// default implementation filters items of type tweet
// override in the subclass implementation to filter items of other types
def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean =
ItemTypeFilterPredicates.isItemTypeTweet(itemTypeOpt)
def getSourceTimestamp(logBase: LogBase): Option[Long] =
logBase.clientEventReceiver match {
case Some(ClientEventReceiver.CesHttp) | Some(ClientEventReceiver.CesThrift) =>
logBase.driftAdjustedEventCreatedAtMs
case _ => Some(logBase.driftAdjustedEventCreatedAtMs.getOrElse(logBase.timestamp))
}
}

View File

@ -0,0 +1,46 @@
package com.twitter.unified_user_actions.adapter.client_event
import com.twitter.clientapp.thriftscala.ItemType
import com.twitter.clientapp.thriftscala.LogEvent
import com.twitter.clientapp.thriftscala.{Item => LogEventItem}
import com.twitter.unified_user_actions.thriftscala._
abstract class BaseFeedbackSubmitClientEvent(actionType: ActionType)
extends BaseClientEvent(actionType = actionType) {
override def getUuaItem(
ceItem: LogEventItem,
logEvent: LogEvent
): Option[Item] = {
logEvent.eventNamespace.flatMap(_.page).flatMap {
case "search" =>
val searchInfoUtil = new SearchInfoUtils(ceItem)
searchInfoUtil.getQueryOptFromItem(logEvent).flatMap { query =>
val isRelevant: Boolean = logEvent.eventNamespace
.flatMap(_.element)
.contains("is_relevant")
logEvent.eventNamespace.flatMap(_.component).flatMap {
case "relevance_prompt_module" =>
for (actionTweetId <- ceItem.id)
yield Item.FeedbackPromptInfo(
FeedbackPromptInfo(
feedbackPromptActionInfo = FeedbackPromptActionInfo.TweetRelevantToSearch(
TweetRelevantToSearch(
searchQuery = query,
tweetId = actionTweetId,
isRelevant = Some(isRelevant)))))
case "did_you_find_it_module" =>
Some(
Item.FeedbackPromptInfo(FeedbackPromptInfo(feedbackPromptActionInfo =
FeedbackPromptActionInfo.DidYouFindItSearch(
DidYouFindItSearch(searchQuery = query, isRelevant = Some(isRelevant))))))
}
}
case _ => None
}
}
override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean =
ItemTypeFilterPredicates.isItemTypeForSearchResultsPageFeedbackSubmit(itemTypeOpt)
}

View File

@ -0,0 +1,48 @@
package com.twitter.unified_user_actions.adapter.client_event
import com.twitter.clientapp.thriftscala.ItemType
import com.twitter.clientapp.thriftscala.LogEvent
import com.twitter.clientapp.thriftscala.{Item => LogEventItem}
import com.twitter.unified_user_actions.thriftscala._
abstract class BaseNotificationTabClientEvent(actionType: ActionType)
extends BaseClientEvent(actionType = actionType) {
// itemType is `None` for Notification Tab events
override def isItemTypeValid(itemTypeOpt: Option[ItemType]): Boolean =
ItemTypeFilterPredicates.ignoreItemType(itemTypeOpt)
override def getUuaItem(
ceItem: LogEventItem,
logEvent: LogEvent
): Option[Item] = for {
notificationTabDetails <- ceItem.notificationTabDetails
clientEventMetadata <- notificationTabDetails.clientEventMetadata
notificationId <- NotificationClientEventUtils.getNotificationIdForNotificationTab(ceItem)
} yield {
clientEventMetadata.tweetIds match {
// if `tweetIds` contain more than one Tweet id, create `MultiTweetNotification`
case Some(tweetIds) if tweetIds.size > 1 =>
Item.NotificationInfo(
NotificationInfo(
actionNotificationId = notificationId,
content = NotificationContent.MultiTweetNotification(
MultiTweetNotification(tweetIds = tweetIds))
))
// if `tweetIds` contain exactly one Tweet id, create `TweetNotification`
case Some(tweetIds) if tweetIds.size == 1 =>
Item.NotificationInfo(
NotificationInfo(
actionNotificationId = notificationId,
content =
NotificationContent.TweetNotification(TweetNotification(tweetId = tweetIds.head))))
// if `tweetIds` are missing, create `UnknownNotification`
case _ =>
Item.NotificationInfo(
NotificationInfo(
actionNotificationId = notificationId,
content = NotificationContent.UnknownNotification(UnknownNotification())
))
}
}
}

Some files were not shown because too many files have changed in this diff Show More