mirror of
https://github.com/twitter/the-algorithm.git
synced 2025-01-20 07:51:15 +01:00
Merge branch 'twitter:main' into change-readme
This commit is contained in:
commit
94ba6c0073
16
README.md
16
README.md
@ -1,6 +1,6 @@
|
|||||||
# Twitter Recommendation Algorithm
|
# Twitter's Recommendation Algorithm
|
||||||
|
|
||||||
The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
|
Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
|
||||||
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
|
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
|
||||||
diagram below illustrates how major services and jobs interconnect.
|
diagram below illustrates how major services and jobs interconnect.
|
||||||
|
|
||||||
@ -13,24 +13,24 @@ These are the main components of the Recommendation Algorithm included in this r
|
|||||||
| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
|
| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
|
||||||
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
|
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
|
||||||
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
|
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
|
||||||
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict likelihood of a Twitter User interacting with another User. |
|
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. |
|
||||||
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
|
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
|
||||||
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
|
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
|
||||||
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
|
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
|
||||||
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
|
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
|
||||||
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
|
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
|
||||||
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos) |
|
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). |
|
||||||
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
|
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
|
||||||
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light ranker model used by search index (Earlybird) to rank Tweets. |
|
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light Ranker model used by search index (Earlybird) to rank Tweets. |
|
||||||
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
|
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
|
||||||
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md) |
|
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). |
|
||||||
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
|
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
|
||||||
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
|
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
|
||||||
| Software framework | [navi](navi/navi/README.md) | High performance, machine learning model serving written in Rust. |
|
| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. |
|
||||||
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
|
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
|
||||||
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
|
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
|
||||||
|
|
||||||
We include Bazel BUILD files for most components, but not a top level BUILD or WORKSPACE file.
|
We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file.
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
|
@ -91,7 +91,7 @@ def parse_metric(config):
|
|||||||
elif metric_str == "linf":
|
elif metric_str == "linf":
|
||||||
return faiss.METRIC_Linf
|
return faiss.METRIC_Linf
|
||||||
else:
|
else:
|
||||||
raise Exception(f"Uknown metric: {metric_str}")
|
raise Exception(f"Unknown metric: {metric_str}")
|
||||||
|
|
||||||
|
|
||||||
def run_pipeline(argv=[]):
|
def run_pipeline(argv=[]):
|
||||||
|
@ -2,6 +2,6 @@
|
|||||||
|
|
||||||
CR-Mixer is a candidate generation service proposed as part of the Personalization Strategy vision for Twitter. Its aim is to speed up the iteration and development of candidate generation and light ranking. The service acts as a lightweight coordinating layer that delegates candidate generation tasks to underlying compute services. It focuses on Twitter's candidate generation use cases and offers a centralized platform for fetching, mixing, and managing candidate sources and light rankers. The overarching goal is to increase the speed and ease of testing and developing candidate generation pipelines, ultimately delivering more value to Twitter users.
|
CR-Mixer is a candidate generation service proposed as part of the Personalization Strategy vision for Twitter. Its aim is to speed up the iteration and development of candidate generation and light ranking. The service acts as a lightweight coordinating layer that delegates candidate generation tasks to underlying compute services. It focuses on Twitter's candidate generation use cases and offers a centralized platform for fetching, mixing, and managing candidate sources and light rankers. The overarching goal is to increase the speed and ease of testing and developing candidate generation pipelines, ultimately delivering more value to Twitter users.
|
||||||
|
|
||||||
CR-Mixer act as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
|
CR-Mixer acts as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
|
||||||
|
|
||||||
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
|
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
|
@ -6,8 +6,6 @@ import com.twitter.search.earlybird.thriftscala.EarlybirdService
|
|||||||
import com.twitter.search.earlybird.thriftscala.ThriftSearchQuery
|
import com.twitter.search.earlybird.thriftscala.ThriftSearchQuery
|
||||||
import com.twitter.util.Time
|
import com.twitter.util.Time
|
||||||
import com.twitter.search.common.query.thriftjava.thriftscala.CollectorParams
|
import com.twitter.search.common.query.thriftjava.thriftscala.CollectorParams
|
||||||
import com.twitter.search.common.ranking.thriftscala.ThriftAgeDecayRankingParams
|
|
||||||
import com.twitter.search.common.ranking.thriftscala.ThriftLinearFeatureRankingParams
|
|
||||||
import com.twitter.search.common.ranking.thriftscala.ThriftRankingParams
|
import com.twitter.search.common.ranking.thriftscala.ThriftRankingParams
|
||||||
import com.twitter.search.common.ranking.thriftscala.ThriftScoringFunctionType
|
import com.twitter.search.common.ranking.thriftscala.ThriftScoringFunctionType
|
||||||
import com.twitter.search.earlybird.thriftscala.ThriftSearchRelevanceOptions
|
import com.twitter.search.earlybird.thriftscala.ThriftSearchRelevanceOptions
|
||||||
@ -97,7 +95,7 @@ object EarlybirdTensorflowBasedSimilarityEngine {
|
|||||||
// Whether to collect conversation IDs. Remove it for now.
|
// Whether to collect conversation IDs. Remove it for now.
|
||||||
// collectConversationId = Gate.True(), // true for Home
|
// collectConversationId = Gate.True(), // true for Home
|
||||||
rankingMode = ThriftSearchRankingMode.Relevance,
|
rankingMode = ThriftSearchRankingMode.Relevance,
|
||||||
relevanceOptions = Some(getRelevanceOptions(query.useTensorflowRanking)),
|
relevanceOptions = Some(getRelevanceOptions),
|
||||||
collectorParams = Some(
|
collectorParams = Some(
|
||||||
CollectorParams(
|
CollectorParams(
|
||||||
// numResultsToReturn defines how many results each EB shard will return to search root
|
// numResultsToReturn defines how many results each EB shard will return to search root
|
||||||
@ -116,13 +114,11 @@ object EarlybirdTensorflowBasedSimilarityEngine {
|
|||||||
// The specific values of recap relevance/reranking options correspond to
|
// The specific values of recap relevance/reranking options correspond to
|
||||||
// experiment: enable_recap_reranking_2988,timeline_internal_disable_recap_filter
|
// experiment: enable_recap_reranking_2988,timeline_internal_disable_recap_filter
|
||||||
// bucket : enable_rerank,disable_filter
|
// bucket : enable_rerank,disable_filter
|
||||||
private def getRelevanceOptions(useTensorflowRanking: Boolean): ThriftSearchRelevanceOptions = {
|
private def getRelevanceOptions: ThriftSearchRelevanceOptions = {
|
||||||
ThriftSearchRelevanceOptions(
|
ThriftSearchRelevanceOptions(
|
||||||
proximityScoring = true,
|
proximityScoring = true,
|
||||||
maxConsecutiveSameUser = Some(2),
|
maxConsecutiveSameUser = Some(2),
|
||||||
rankingParams =
|
rankingParams = Some(getTensorflowBasedRankingParams),
|
||||||
if (useTensorflowRanking) Some(getTensorflowBasedRankingParams)
|
|
||||||
else Some(getLinearRankingParams),
|
|
||||||
maxHitsToProcess = Some(500),
|
maxHitsToProcess = Some(500),
|
||||||
maxUserBlendCount = Some(3),
|
maxUserBlendCount = Some(3),
|
||||||
proximityPhraseWeight = 9.0,
|
proximityPhraseWeight = 9.0,
|
||||||
@ -131,41 +127,12 @@ object EarlybirdTensorflowBasedSimilarityEngine {
|
|||||||
}
|
}
|
||||||
|
|
||||||
private def getTensorflowBasedRankingParams: ThriftRankingParams = {
|
private def getTensorflowBasedRankingParams: ThriftRankingParams = {
|
||||||
getLinearRankingParams.copy(
|
ThriftRankingParams(
|
||||||
`type` = Some(ThriftScoringFunctionType.TensorflowBased),
|
`type` = Some(ThriftScoringFunctionType.TensorflowBased),
|
||||||
selectedTensorflowModel = Some("timelines_rectweet_replica"),
|
selectedTensorflowModel = Some("timelines_rectweet_replica"),
|
||||||
|
minScore = -1.0e100,
|
||||||
applyBoosts = false,
|
applyBoosts = false,
|
||||||
authorSpecificScoreAdjustments = None
|
authorSpecificScoreAdjustments = None
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
private def getLinearRankingParams: ThriftRankingParams = {
|
|
||||||
ThriftRankingParams(
|
|
||||||
`type` = Some(ThriftScoringFunctionType.Linear),
|
|
||||||
minScore = -1.0e100,
|
|
||||||
retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)),
|
|
||||||
replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
|
|
||||||
reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)),
|
|
||||||
luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
|
|
||||||
textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)),
|
|
||||||
urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
|
|
||||||
isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
|
|
||||||
favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)),
|
|
||||||
langEnglishUIBoost = 0.5,
|
|
||||||
langEnglishTweetBoost = 0.2,
|
|
||||||
langDefaultBoost = 0.02,
|
|
||||||
unknownLanguageBoost = 0.05,
|
|
||||||
offensiveBoost = 0.1,
|
|
||||||
inTrustedCircleBoost = 3.0,
|
|
||||||
multipleHashtagsOrTrendsBoost = 0.6,
|
|
||||||
inDirectFollowBoost = 4.0,
|
|
||||||
tweetHasTrendBoost = 1.1,
|
|
||||||
selfTweetBoost = 2.0,
|
|
||||||
tweetHasImageUrlBoost = 2.0,
|
|
||||||
tweetHasVideoUrlBoost = 2.0,
|
|
||||||
useUserLanguageInfo = true,
|
|
||||||
ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0))
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
@ -160,7 +160,7 @@ object HomeTweetTypePredicates {
|
|||||||
("has_gte_1k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
|
("has_gte_1k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
|
||||||
(
|
(
|
||||||
"has_gte_10k_favs",
|
"has_gte_10k_favs",
|
||||||
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
|
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 10000))),
|
||||||
(
|
(
|
||||||
"has_gte_100k_favs",
|
"has_gte_100k_favs",
|
||||||
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 100000))),
|
_.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 100000))),
|
||||||
|
@ -15,28 +15,6 @@ object RelevanceSearchUtil {
|
|||||||
`type` = Some(scr.ThriftScoringFunctionType.TensorflowBased),
|
`type` = Some(scr.ThriftScoringFunctionType.TensorflowBased),
|
||||||
selectedTensorflowModel = Some("timelines_rectweet_replica"),
|
selectedTensorflowModel = Some("timelines_rectweet_replica"),
|
||||||
minScore = -1.0e100,
|
minScore = -1.0e100,
|
||||||
retweetCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 20.0)),
|
|
||||||
replyCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 1.0)),
|
|
||||||
reputationParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 0.2)),
|
|
||||||
luceneScoreParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 2.0)),
|
|
||||||
textScoreParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 0.18)),
|
|
||||||
urlParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 2.0)),
|
|
||||||
isReplyParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 1.0)),
|
|
||||||
favCountParams = Some(scr.ThriftLinearFeatureRankingParams(weight = 30.0)),
|
|
||||||
langEnglishUIBoost = 0.5,
|
|
||||||
langEnglishTweetBoost = 0.2,
|
|
||||||
langDefaultBoost = 0.02,
|
|
||||||
unknownLanguageBoost = 0.05,
|
|
||||||
offensiveBoost = 0.1,
|
|
||||||
inTrustedCircleBoost = 3.0,
|
|
||||||
multipleHashtagsOrTrendsBoost = 0.6,
|
|
||||||
inDirectFollowBoost = 4.0,
|
|
||||||
tweetHasTrendBoost = 1.1,
|
|
||||||
selfTweetBoost = 2.0,
|
|
||||||
tweetHasImageUrlBoost = 2.0,
|
|
||||||
tweetHasVideoUrlBoost = 2.0,
|
|
||||||
useUserLanguageInfo = true,
|
|
||||||
ageDecayParams = Some(scr.ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0)),
|
|
||||||
selectedModels = Some(Map("home_mixer_unified_engagement_prod" -> 1.0)),
|
selectedModels = Some(Map("home_mixer_unified_engagement_prod" -> 1.0)),
|
||||||
applyBoosts = false,
|
applyBoosts = false,
|
||||||
)
|
)
|
||||||
|
@ -1,6 +1,6 @@
|
|||||||
# Navi: High-Performance Machine Learning Serving Server in Rust
|
# Navi: High-Performance Machine Learning Serving Server in Rust
|
||||||
|
|
||||||
Navi is a high-performance, versatile machine learning serving server implemented in Rust, tailored for production usage. It's designed to efficiently serve within the Twitter tech stack, offering top-notch performance while focusing on core features.
|
Navi is a high-performance, versatile machine learning serving server implemented in Rust and tailored for production usage. It's designed to efficiently serve within the Twitter tech stack, offering top-notch performance while focusing on core features.
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
@ -23,12 +23,14 @@ While Navi's features may not be as comprehensive as its open-source counterpart
|
|||||||
- `thrift_bpr_adapter`: generated thrift code for BatchPredictionRequest
|
- `thrift_bpr_adapter`: generated thrift code for BatchPredictionRequest
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
We include all *.rs source code that makes up the main navi binaries for you to examine. The test and benchmark code, as well as configuration files are not included due to data security concerns.
|
We have included all *.rs source code files that make up the main Navi binaries for you to examine. However, we have not included the test and benchmark code, or various configuration files, due to data security concerns.
|
||||||
|
|
||||||
## Run
|
## Run
|
||||||
in navi/navi you can run. Note you need to create a models directory and create some versions, preferably using epoch time, e.g., 1679693908377
|
In navi/navi, you can run the following commands:
|
||||||
- scripts/run_tf2.sh
|
- `scripts/run_tf2.sh` for [TensorFlow](https://www.tensorflow.org/)
|
||||||
- scripts/run_onnx.sh
|
- `scripts/run_onnx.sh` for [Onnx](https://onnx.ai/)
|
||||||
|
|
||||||
|
Do note that you need to create a models directory and create some versions, preferably using epoch time, e.g., `1679693908377`.
|
||||||
|
|
||||||
## Build
|
## Build
|
||||||
you can adapt the above scripts to build using Cargo
|
You can adapt the above scripts to build using Cargo.
|
@ -44,6 +44,5 @@ pub struct RenamedFeatures {
|
|||||||
}
|
}
|
||||||
|
|
||||||
pub fn parse(json_str: &str) -> Result<AllConfig, Error> {
|
pub fn parse(json_str: &str) -> Result<AllConfig, Error> {
|
||||||
let all_config: AllConfig = serde_json::from_str(json_str)?;
|
serde_json::from_str(json_str)
|
||||||
return std::result::Result::Ok(all_config);
|
|
||||||
}
|
}
|
||||||
|
@ -16,8 +16,7 @@ use segdense::util;
|
|||||||
use thrift::protocol::{TBinaryInputProtocol, TSerializable};
|
use thrift::protocol::{TBinaryInputProtocol, TSerializable};
|
||||||
use thrift::transport::TBufferChannel;
|
use thrift::transport::TBufferChannel;
|
||||||
|
|
||||||
use crate::{all_config};
|
use crate::{all_config, all_config::AllConfig};
|
||||||
use crate::all_config::AllConfig;
|
|
||||||
|
|
||||||
pub fn log_feature_match(
|
pub fn log_feature_match(
|
||||||
dr: &DataRecord,
|
dr: &DataRecord,
|
||||||
@ -27,26 +26,22 @@ pub fn log_feature_match(
|
|||||||
// Note the following algorithm matches features from config using linear search.
|
// Note the following algorithm matches features from config using linear search.
|
||||||
// Also the record source is MinDataRecord. This includes only binary and continous features for now.
|
// Also the record source is MinDataRecord. This includes only binary and continous features for now.
|
||||||
|
|
||||||
for (feature_id, feature_value) in dr.continuous_features.as_ref().unwrap().into_iter() {
|
for (feature_id, feature_value) in dr.continuous_features.as_ref().unwrap() {
|
||||||
debug!(
|
debug!(
|
||||||
"{} - Continous Datarecord => Feature ID: {}, Feature value: {}",
|
"{dr_type} - Continuous Datarecord => Feature ID: {feature_id}, Feature value: {feature_value}"
|
||||||
dr_type, feature_id, feature_value
|
|
||||||
);
|
);
|
||||||
for input_feature in &seg_dense_config.cont.input_features {
|
for input_feature in &seg_dense_config.cont.input_features {
|
||||||
if input_feature.feature_id == *feature_id {
|
if input_feature.feature_id == *feature_id {
|
||||||
debug!("Matching input feature: {:?}", input_feature)
|
debug!("Matching input feature: {input_feature:?}")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
for feature_id in dr.binary_features.as_ref().unwrap().into_iter() {
|
for feature_id in dr.binary_features.as_ref().unwrap() {
|
||||||
debug!(
|
debug!("{dr_type} - Binary Datarecord => Feature ID: {feature_id}");
|
||||||
"{} - Binary Datarecord => Feature ID: {}",
|
|
||||||
dr_type, feature_id
|
|
||||||
);
|
|
||||||
for input_feature in &seg_dense_config.binary.input_features {
|
for input_feature in &seg_dense_config.binary.input_features {
|
||||||
if input_feature.feature_id == *feature_id {
|
if input_feature.feature_id == *feature_id {
|
||||||
debug!("Found input feature: {:?}", input_feature)
|
debug!("Found input feature: {input_feature:?}")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -96,15 +91,13 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
reporting_feature_ids: Vec<(i64, &str)>,
|
reporting_feature_ids: Vec<(i64, &str)>,
|
||||||
register_metric_fn: Option<impl Fn(&HistogramVec)>,
|
register_metric_fn: Option<impl Fn(&HistogramVec)>,
|
||||||
) -> BatchPredictionRequestToTorchTensorConverter {
|
) -> BatchPredictionRequestToTorchTensorConverter {
|
||||||
let all_config_path = format!("{}/{}/all_config.json", model_dir, model_version);
|
let all_config_path = format!("{model_dir}/{model_version}/all_config.json");
|
||||||
let seg_dense_config_path = format!(
|
let seg_dense_config_path =
|
||||||
"{}/{}/segdense_transform_spec_home_recap_2022.json",
|
format!("{model_dir}/{model_version}/segdense_transform_spec_home_recap_2022.json");
|
||||||
model_dir, model_version
|
|
||||||
);
|
|
||||||
let seg_dense_config = util::load_config(&seg_dense_config_path);
|
let seg_dense_config = util::load_config(&seg_dense_config_path);
|
||||||
let all_config = all_config::parse(
|
let all_config = all_config::parse(
|
||||||
&fs::read_to_string(&all_config_path)
|
&fs::read_to_string(&all_config_path)
|
||||||
.unwrap_or_else(|error| panic!("error loading all_config.json - {}", error)),
|
.unwrap_or_else(|error| panic!("error loading all_config.json - {error}")),
|
||||||
)
|
)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
@ -138,11 +131,11 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
let (discrete_feature_metrics, continuous_feature_metrics) = METRICS.get_or_init(|| {
|
let (discrete_feature_metrics, continuous_feature_metrics) = METRICS.get_or_init(|| {
|
||||||
let discrete = HistogramVec::new(
|
let discrete = HistogramVec::new(
|
||||||
HistogramOpts::new(":navi:feature_id:discrete", "Discrete Feature ID values")
|
HistogramOpts::new(":navi:feature_id:discrete", "Discrete Feature ID values")
|
||||||
.buckets(Vec::from(&[
|
.buckets(Vec::from([
|
||||||
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
|
0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
|
||||||
120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0,
|
120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0,
|
||||||
300.0, 500.0, 1000.0, 10000.0, 100000.0,
|
300.0, 500.0, 1000.0, 10000.0, 100000.0,
|
||||||
] as &'static [f64])),
|
])),
|
||||||
&["feature_id"],
|
&["feature_id"],
|
||||||
)
|
)
|
||||||
.expect("metric cannot be created");
|
.expect("metric cannot be created");
|
||||||
@ -151,18 +144,18 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
":navi:feature_id:continuous",
|
":navi:feature_id:continuous",
|
||||||
"continuous Feature ID values",
|
"continuous Feature ID values",
|
||||||
)
|
)
|
||||||
.buckets(Vec::from(&[
|
.buckets(Vec::from([
|
||||||
0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0, 120.0,
|
0.0f64, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0, 110.0,
|
||||||
130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0, 500.0,
|
120.0, 130.0, 140.0, 150.0, 160.0, 170.0, 180.0, 190.0, 200.0, 250.0, 300.0,
|
||||||
1000.0, 10000.0, 100000.0,
|
500.0, 1000.0, 10000.0, 100000.0,
|
||||||
] as &'static [f64])),
|
])),
|
||||||
&["feature_id"],
|
&["feature_id"],
|
||||||
)
|
)
|
||||||
.expect("metric cannot be created");
|
.expect("metric cannot be created");
|
||||||
register_metric_fn.map(|r| {
|
if let Some(r) = register_metric_fn {
|
||||||
r(&discrete);
|
r(&discrete);
|
||||||
r(&continuous);
|
r(&continuous);
|
||||||
});
|
}
|
||||||
(discrete, continuous)
|
(discrete, continuous)
|
||||||
});
|
});
|
||||||
|
|
||||||
@ -171,16 +164,13 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
|
|
||||||
for (feature_id, feature_type) in reporting_feature_ids.iter() {
|
for (feature_id, feature_type) in reporting_feature_ids.iter() {
|
||||||
match *feature_type {
|
match *feature_type {
|
||||||
"discrete" => discrete_features_to_report.insert(feature_id.clone()),
|
"discrete" => discrete_features_to_report.insert(*feature_id),
|
||||||
"continuous" => continuous_features_to_report.insert(feature_id.clone()),
|
"continuous" => continuous_features_to_report.insert(*feature_id),
|
||||||
_ => panic!(
|
_ => panic!("Invalid feature type {feature_type} for reporting metrics!"),
|
||||||
"Invalid feature type {} for reporting metrics!",
|
|
||||||
feature_type
|
|
||||||
),
|
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
return BatchPredictionRequestToTorchTensorConverter {
|
BatchPredictionRequestToTorchTensorConverter {
|
||||||
all_config,
|
all_config,
|
||||||
seg_dense_config,
|
seg_dense_config,
|
||||||
all_config_path,
|
all_config_path,
|
||||||
@ -193,7 +183,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
continuous_features_to_report,
|
continuous_features_to_report,
|
||||||
discrete_feature_metrics,
|
discrete_feature_metrics,
|
||||||
continuous_feature_metrics,
|
continuous_feature_metrics,
|
||||||
};
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_feature_id(feature_name: &str, seg_dense_config: &Root) -> i64 {
|
fn get_feature_id(feature_name: &str, seg_dense_config: &Root) -> i64 {
|
||||||
@ -203,7 +193,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
return feature.feature_id;
|
return feature.feature_id;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return -1;
|
-1
|
||||||
}
|
}
|
||||||
|
|
||||||
fn parse_batch_prediction_request(bytes: Vec<u8>) -> BatchPredictionRequest {
|
fn parse_batch_prediction_request(bytes: Vec<u8>) -> BatchPredictionRequest {
|
||||||
@ -211,7 +201,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
let mut bc = TBufferChannel::with_capacity(bytes.len(), 0);
|
let mut bc = TBufferChannel::with_capacity(bytes.len(), 0);
|
||||||
bc.set_readable_bytes(&bytes);
|
bc.set_readable_bytes(&bytes);
|
||||||
let mut protocol = TBinaryInputProtocol::new(bc, true);
|
let mut protocol = TBinaryInputProtocol::new(bc, true);
|
||||||
return BatchPredictionRequest::read_from_in_protocol(&mut protocol).unwrap();
|
BatchPredictionRequest::read_from_in_protocol(&mut protocol).unwrap()
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_embedding_tensors(
|
fn get_embedding_tensors(
|
||||||
@ -228,45 +218,43 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
let mut working_set = vec![0 as f32; total_size];
|
let mut working_set = vec![0 as f32; total_size];
|
||||||
let mut bpr_start = 0;
|
let mut bpr_start = 0;
|
||||||
for (bpr, &bpr_end) in bprs.iter().zip(batch_size) {
|
for (bpr, &bpr_end) in bprs.iter().zip(batch_size) {
|
||||||
if bpr.common_features.is_some() {
|
if bpr.common_features.is_some()
|
||||||
if bpr.common_features.as_ref().unwrap().tensors.is_some() {
|
&& bpr.common_features.as_ref().unwrap().tensors.is_some()
|
||||||
if bpr
|
&& bpr
|
||||||
.common_features
|
.common_features
|
||||||
.as_ref()
|
.as_ref()
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.tensors
|
.tensors
|
||||||
.as_ref()
|
.as_ref()
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.contains_key(&feature_id)
|
.contains_key(&feature_id)
|
||||||
|
{
|
||||||
|
let source_tensor = bpr
|
||||||
|
.common_features
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.tensors
|
||||||
|
.as_ref()
|
||||||
|
.unwrap()
|
||||||
|
.get(&feature_id)
|
||||||
|
.unwrap();
|
||||||
|
let tensor = match source_tensor {
|
||||||
|
GeneralTensor::FloatTensor(float_tensor) =>
|
||||||
|
//Tensor::of_slice(
|
||||||
{
|
{
|
||||||
let source_tensor = bpr
|
float_tensor
|
||||||
.common_features
|
.floats
|
||||||
.as_ref()
|
.iter()
|
||||||
.unwrap()
|
.map(|x| x.into_inner() as f32)
|
||||||
.tensors
|
.collect::<Vec<_>>()
|
||||||
.as_ref()
|
}
|
||||||
.unwrap()
|
_ => vec![0 as f32; cols],
|
||||||
.get(&feature_id)
|
};
|
||||||
.unwrap();
|
|
||||||
let tensor = match source_tensor {
|
|
||||||
GeneralTensor::FloatTensor(float_tensor) =>
|
|
||||||
//Tensor::of_slice(
|
|
||||||
{
|
|
||||||
float_tensor
|
|
||||||
.floats
|
|
||||||
.iter()
|
|
||||||
.map(|x| x.into_inner() as f32)
|
|
||||||
.collect::<Vec<_>>()
|
|
||||||
}
|
|
||||||
_ => vec![0 as f32; cols],
|
|
||||||
};
|
|
||||||
|
|
||||||
// since the tensor is found in common feature, add it in all batches
|
// since the tensor is found in common feature, add it in all batches
|
||||||
for row in bpr_start..bpr_end {
|
for row in bpr_start..bpr_end {
|
||||||
for col in 0..cols {
|
for col in 0..cols {
|
||||||
working_set[row * cols + col] = tensor[col];
|
working_set[row * cols + col] = tensor[col];
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -300,7 +288,7 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
}
|
}
|
||||||
bpr_start = bpr_end;
|
bpr_start = bpr_end;
|
||||||
}
|
}
|
||||||
return Array2::<f32>::from_shape_vec([rows, cols], working_set).unwrap();
|
Array2::<f32>::from_shape_vec([rows, cols], working_set).unwrap()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Todo : Refactor, create a generic version with different type and field accessors
|
// Todo : Refactor, create a generic version with different type and field accessors
|
||||||
@ -310,9 +298,9 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
// (INT64 --> INT64, DataRecord.discrete_feature)
|
// (INT64 --> INT64, DataRecord.discrete_feature)
|
||||||
fn get_continuous(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
fn get_continuous(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
||||||
// These need to be part of model schema
|
// These need to be part of model schema
|
||||||
let rows: usize = batch_ends[batch_ends.len() - 1];
|
let rows = batch_ends[batch_ends.len() - 1];
|
||||||
let cols: usize = 5293;
|
let cols = 5293;
|
||||||
let full_size: usize = (rows * cols).try_into().unwrap();
|
let full_size = rows * cols;
|
||||||
let default_val = f32::NAN;
|
let default_val = f32::NAN;
|
||||||
|
|
||||||
let mut tensor = vec![default_val; full_size];
|
let mut tensor = vec![default_val; full_size];
|
||||||
@ -337,55 +325,48 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
for feature in common_features {
|
for feature in common_features {
|
||||||
match self.feature_mapper.get(feature.0) {
|
if let Some(f_info) = self.feature_mapper.get(feature.0) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
if idx < cols {
|
||||||
if idx < cols {
|
// Set value in each row
|
||||||
// Set value in each row
|
for r in bpr_start..bpr_end {
|
||||||
for r in bpr_start..bpr_end {
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
tensor[flat_index] = feature.1.into_inner() as f32;
|
||||||
tensor[flat_index] = feature.1.into_inner() as f32;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
if self.continuous_features_to_report.contains(feature.0) {
|
if self.continuous_features_to_report.contains(feature.0) {
|
||||||
self.continuous_feature_metrics
|
self.continuous_feature_metrics
|
||||||
.with_label_values(&[feature.0.to_string().as_str()])
|
.with_label_values(&[feature.0.to_string().as_str()])
|
||||||
.observe(feature.1.into_inner() as f64)
|
.observe(feature.1.into_inner())
|
||||||
} else if self.discrete_features_to_report.contains(feature.0) {
|
} else if self.discrete_features_to_report.contains(feature.0) {
|
||||||
self.discrete_feature_metrics
|
self.discrete_feature_metrics
|
||||||
.with_label_values(&[feature.0.to_string().as_str()])
|
.with_label_values(&[feature.0.to_string().as_str()])
|
||||||
.observe(feature.1.into_inner() as f64)
|
.observe(feature.1.into_inner())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Process the batch of datarecords
|
// Process the batch of datarecords
|
||||||
for r in bpr_start..bpr_end {
|
for r in bpr_start..bpr_end {
|
||||||
let dr: &DataRecord =
|
let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start];
|
||||||
&bpr.individual_features_list[usize::try_from(r - bpr_start).unwrap()];
|
|
||||||
if dr.continuous_features.is_some() {
|
if dr.continuous_features.is_some() {
|
||||||
for feature in dr.continuous_features.as_ref().unwrap() {
|
for feature in dr.continuous_features.as_ref().unwrap() {
|
||||||
match self.feature_mapper.get(&feature.0) {
|
if let Some(f_info) = self.feature_mapper.get(feature.0) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
if flat_index < tensor.len() && idx < cols {
|
||||||
if flat_index < tensor.len() && idx < cols {
|
tensor[flat_index] = feature.1.into_inner() as f32;
|
||||||
tensor[flat_index] = feature.1.into_inner() as f32;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
if self.continuous_features_to_report.contains(feature.0) {
|
if self.continuous_features_to_report.contains(feature.0) {
|
||||||
self.continuous_feature_metrics
|
self.continuous_feature_metrics
|
||||||
.with_label_values(&[feature.0.to_string().as_str()])
|
.with_label_values(&[feature.0.to_string().as_str()])
|
||||||
.observe(feature.1.into_inner() as f64)
|
.observe(feature.1.into_inner())
|
||||||
} else if self.discrete_features_to_report.contains(feature.0) {
|
} else if self.discrete_features_to_report.contains(feature.0) {
|
||||||
self.discrete_feature_metrics
|
self.discrete_feature_metrics
|
||||||
.with_label_values(&[feature.0.to_string().as_str()])
|
.with_label_values(&[feature.0.to_string().as_str()])
|
||||||
.observe(feature.1.into_inner() as f64)
|
.observe(feature.1.into_inner())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -393,22 +374,19 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
bpr_start = bpr_end;
|
bpr_start = bpr_end;
|
||||||
}
|
}
|
||||||
|
|
||||||
return InputTensor::FloatTensor(
|
InputTensor::FloatTensor(
|
||||||
Array2::<f32>::from_shape_vec(
|
Array2::<f32>::from_shape_vec([rows, cols], tensor)
|
||||||
[rows.try_into().unwrap(), cols.try_into().unwrap()],
|
.unwrap()
|
||||||
tensor,
|
.into_dyn(),
|
||||||
)
|
)
|
||||||
.unwrap()
|
|
||||||
.into_dyn(),
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_binary(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
fn get_binary(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
||||||
// These need to be part of model schema
|
// These need to be part of model schema
|
||||||
let rows: usize = batch_ends[batch_ends.len() - 1];
|
let rows = batch_ends[batch_ends.len() - 1];
|
||||||
let cols: usize = 149;
|
let cols = 149;
|
||||||
let full_size: usize = (rows * cols).try_into().unwrap();
|
let full_size = rows * cols;
|
||||||
let default_val: i64 = 0;
|
let default_val = 0;
|
||||||
|
|
||||||
let mut v = vec![default_val; full_size];
|
let mut v = vec![default_val; full_size];
|
||||||
|
|
||||||
@ -432,55 +410,48 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
for feature in common_features {
|
for feature in common_features {
|
||||||
match self.feature_mapper.get(feature) {
|
if let Some(f_info) = self.feature_mapper.get(feature) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
if idx < cols {
|
||||||
if idx < cols {
|
// Set value in each row
|
||||||
// Set value in each row
|
for r in bpr_start..bpr_end {
|
||||||
for r in bpr_start..bpr_end {
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
v[flat_index] = 1;
|
||||||
v[flat_index] = 1;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Process the batch of datarecords
|
// Process the batch of datarecords
|
||||||
for r in bpr_start..bpr_end {
|
for r in bpr_start..bpr_end {
|
||||||
let dr: &DataRecord =
|
let dr: &DataRecord = &bpr.individual_features_list[r - bpr_start];
|
||||||
&bpr.individual_features_list[usize::try_from(r - bpr_start).unwrap()];
|
|
||||||
if dr.binary_features.is_some() {
|
if dr.binary_features.is_some() {
|
||||||
for feature in dr.binary_features.as_ref().unwrap() {
|
for feature in dr.binary_features.as_ref().unwrap() {
|
||||||
match self.feature_mapper.get(&feature) {
|
if let Some(f_info) = self.feature_mapper.get(feature) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
v[flat_index] = 1;
|
||||||
v[flat_index] = 1;
|
|
||||||
}
|
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
bpr_start = bpr_end;
|
bpr_start = bpr_end;
|
||||||
}
|
}
|
||||||
return InputTensor::Int64Tensor(
|
InputTensor::Int64Tensor(
|
||||||
Array2::<i64>::from_shape_vec([rows.try_into().unwrap(), cols.try_into().unwrap()], v)
|
Array2::<i64>::from_shape_vec([rows, cols], v)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_dyn(),
|
.into_dyn(),
|
||||||
);
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[allow(dead_code)]
|
#[allow(dead_code)]
|
||||||
fn get_discrete(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
fn get_discrete(&self, bprs: &[BatchPredictionRequest], batch_ends: &[usize]) -> InputTensor {
|
||||||
// These need to be part of model schema
|
// These need to be part of model schema
|
||||||
let rows: usize = batch_ends[batch_ends.len() - 1];
|
let rows = batch_ends[batch_ends.len() - 1];
|
||||||
let cols: usize = 320;
|
let cols = 320;
|
||||||
let full_size: usize = (rows * cols).try_into().unwrap();
|
let full_size = rows * cols;
|
||||||
let default_val: i64 = 0;
|
let default_val = 0;
|
||||||
|
|
||||||
let mut v = vec![default_val; full_size];
|
let mut v = vec![default_val; full_size];
|
||||||
|
|
||||||
@ -504,18 +475,15 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
for feature in common_features {
|
for feature in common_features {
|
||||||
match self.feature_mapper.get(feature.0) {
|
if let Some(f_info) = self.feature_mapper.get(feature.0) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
if idx < cols {
|
||||||
if idx < cols {
|
// Set value in each row
|
||||||
// Set value in each row
|
for r in bpr_start..bpr_end {
|
||||||
for r in bpr_start..bpr_end {
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
v[flat_index] = *feature.1;
|
||||||
v[flat_index] = *feature.1;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
if self.discrete_features_to_report.contains(feature.0) {
|
if self.discrete_features_to_report.contains(feature.0) {
|
||||||
self.discrete_feature_metrics
|
self.discrete_feature_metrics
|
||||||
@ -527,18 +495,15 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
|
|
||||||
// Process the batch of datarecords
|
// Process the batch of datarecords
|
||||||
for r in bpr_start..bpr_end {
|
for r in bpr_start..bpr_end {
|
||||||
let dr: &DataRecord = &bpr.individual_features_list[usize::try_from(r).unwrap()];
|
let dr: &DataRecord = &bpr.individual_features_list[r];
|
||||||
if dr.discrete_features.is_some() {
|
if dr.discrete_features.is_some() {
|
||||||
for feature in dr.discrete_features.as_ref().unwrap() {
|
for feature in dr.discrete_features.as_ref().unwrap() {
|
||||||
match self.feature_mapper.get(&feature.0) {
|
if let Some(f_info) = self.feature_mapper.get(feature.0) {
|
||||||
Some(f_info) => {
|
let idx = f_info.index_within_tensor as usize;
|
||||||
let idx = f_info.index_within_tensor as usize;
|
let flat_index = r * cols + idx;
|
||||||
let flat_index: usize = (r * cols + idx).try_into().unwrap();
|
if flat_index < v.len() && idx < cols {
|
||||||
if flat_index < v.len() && idx < cols {
|
v[flat_index] = *feature.1;
|
||||||
v[flat_index] = *feature.1;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
None => (),
|
|
||||||
}
|
}
|
||||||
if self.discrete_features_to_report.contains(feature.0) {
|
if self.discrete_features_to_report.contains(feature.0) {
|
||||||
self.discrete_feature_metrics
|
self.discrete_feature_metrics
|
||||||
@ -550,11 +515,11 @@ impl BatchPredictionRequestToTorchTensorConverter {
|
|||||||
}
|
}
|
||||||
bpr_start = bpr_end;
|
bpr_start = bpr_end;
|
||||||
}
|
}
|
||||||
return InputTensor::Int64Tensor(
|
InputTensor::Int64Tensor(
|
||||||
Array2::<i64>::from_shape_vec([rows.try_into().unwrap(), cols.try_into().unwrap()], v)
|
Array2::<i64>::from_shape_vec([rows, cols], v)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_dyn(),
|
.into_dyn(),
|
||||||
);
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get_user_embedding(
|
fn get_user_embedding(
|
||||||
@ -604,7 +569,7 @@ impl Converter for BatchPredictionRequestToTorchTensorConverter {
|
|||||||
.map(|bpr| bpr.individual_features_list.len())
|
.map(|bpr| bpr.individual_features_list.len())
|
||||||
.scan(0usize, |acc, e| {
|
.scan(0usize, |acc, e| {
|
||||||
//running total
|
//running total
|
||||||
*acc = *acc + e;
|
*acc += e;
|
||||||
Some(*acc)
|
Some(*acc)
|
||||||
})
|
})
|
||||||
.collect::<Vec<_>>();
|
.collect::<Vec<_>>();
|
||||||
|
@ -9,15 +9,17 @@ use std::{
|
|||||||
pub fn load_batch_prediction_request_base64(file_name: &str) -> Vec<Vec<u8>> {
|
pub fn load_batch_prediction_request_base64(file_name: &str) -> Vec<Vec<u8>> {
|
||||||
let file = File::open(file_name).expect("could not read file");
|
let file = File::open(file_name).expect("could not read file");
|
||||||
let mut result = vec![];
|
let mut result = vec![];
|
||||||
for line in io::BufReader::new(file).lines() {
|
for (mut line_count, line) in io::BufReader::new(file).lines().enumerate() {
|
||||||
|
line_count += 1;
|
||||||
match base64::decode(line.unwrap().trim()) {
|
match base64::decode(line.unwrap().trim()) {
|
||||||
Ok(payload) => result.push(payload),
|
Ok(payload) => result.push(payload),
|
||||||
Err(err) => println!("error decoding line {}", err),
|
Err(err) => println!("error decoding line {file_name}:{line_count} - {err}"),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
println!("reslt len: {}", result.len());
|
println!("result len: {}", result.len());
|
||||||
return result;
|
result
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn save_to_npy<T: npyz::Serialize + AutoSerialize>(data: &[T], save_to: String) {
|
pub fn save_to_npy<T: npyz::Serialize + AutoSerialize>(data: &[T], save_to: String) {
|
||||||
let mut writer = WriteOptions::new()
|
let mut writer = WriteOptions::new()
|
||||||
.default_dtype()
|
.default_dtype()
|
||||||
|
@ -1,13 +1,10 @@
|
|||||||
# recos-injector
|
# Recos-Injector
|
||||||
Recos-Injector is a streaming event processor for building input streams for GraphJet based services.
|
|
||||||
It is general purpose in that it consumes arbitrary incoming event stream (e.x. Fav, RT, Follow, client_events, etc), applies
|
|
||||||
filtering, combines and publishes cleaned up events to corresponding GraphJet services.
|
|
||||||
Each GraphJet based service subscribes to a dedicated Kafka topic. Recos-Injector enables a GraphJet based service to consume any
|
|
||||||
event it wants
|
|
||||||
|
|
||||||
## How to run recos-injector-server tests
|
Recos-Injector is a streaming event processor used to build input streams for GraphJet-based services. It is a general-purpose tool that consumes arbitrary incoming event streams (e.g., Fav, RT, Follow, client_events, etc.), applies filtering, and combines and publishes cleaned up events to corresponding GraphJet services. Each GraphJet-based service subscribes to a dedicated Kafka topic, and Recos-Injector enables GraphJet-based services to consume any event they want.
|
||||||
|
|
||||||
Tests can be run by using this command from your project's root directory:
|
## How to run Recos-Injector server tests
|
||||||
|
|
||||||
|
You can run tests by using the following command from your project's root directory:
|
||||||
|
|
||||||
$ bazel build recos-injector/...
|
$ bazel build recos-injector/...
|
||||||
$ bazel test recos-injector/...
|
$ bazel test recos-injector/...
|
||||||
@ -28,17 +25,16 @@ terminal:
|
|||||||
$ curl -s localhost:9990/admin/ping
|
$ curl -s localhost:9990/admin/ping
|
||||||
pong
|
pong
|
||||||
|
|
||||||
Run `curl -s localhost:9990/admin` to see a list of all of the available admin
|
Run `curl -s localhost:9990/admin` to see a list of all available admin endpoints.
|
||||||
endpoints.
|
|
||||||
|
|
||||||
## Querying recos-injector-server from a Scala console
|
## Querying Recos-Injector server from a Scala console
|
||||||
|
|
||||||
Recos Injector does not have a thrift endpoint. It reads Event Bus and Kafka queues and writes to recos_injector kafka.
|
Recos-Injector does not have a Thrift endpoint. Instead, it reads Event Bus and Kafka queues and writes to the Recos-Injector Kafka.
|
||||||
|
|
||||||
## Generating a package for deployment
|
## Generating a package for deployment
|
||||||
|
|
||||||
To package your service into a zip for deployment:
|
To package your service into a zip file for deployment, run:
|
||||||
|
|
||||||
$ bazel bundle recos-injector/server:bin --bundle-jvm-archive=zip
|
$ bazel bundle recos-injector/server:bin --bundle-jvm-archive=zip
|
||||||
|
|
||||||
If successful, a file `dist/recos-injector-server.zip` will be created.
|
If the command is successful, a file named `dist/recos-injector-server.zip` will be created.
|
||||||
|
@ -15,7 +15,7 @@ SimClusters from the Linear Algebra Perspective discussed the difference between
|
|||||||
However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure.
|
However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure.
|
||||||
|
|
||||||
|
|
||||||
## SimClusters Approximate Cosine Similariy Core Algorithm
|
## SimClusters Approximate Cosine Similarity Core Algorithm
|
||||||
|
|
||||||
1. Provide a source SimCluster Embedding *SV*, *SV = [(SC1, Score), (SC2, Score), (SC3, Score) …]*
|
1. Provide a source SimCluster Embedding *SV*, *SV = [(SC1, Score), (SC2, Score), (SC3, Score) …]*
|
||||||
|
|
||||||
|
@ -513,12 +513,12 @@ public class BasicIndexingConverter {
|
|||||||
Optional<Long> inReplyToUserId = Optional.of(inReplyToUserIdVal).filter(x -> x > 0);
|
Optional<Long> inReplyToUserId = Optional.of(inReplyToUserIdVal).filter(x -> x > 0);
|
||||||
Optional<Long> inReplyToStatusId = Optional.of(inReplyToStatusIdVal).filter(x -> x > 0);
|
Optional<Long> inReplyToStatusId = Optional.of(inReplyToStatusIdVal).filter(x -> x > 0);
|
||||||
|
|
||||||
// We have six combinations here. A tweet can be
|
// We have six combinations here. A Tweet can be
|
||||||
// 1) a reply to another tweet (then it has both in-reply-to-user-id and
|
// 1) a reply to another tweet (then it has both in-reply-to-user-id and
|
||||||
// in-reply-to-status-id set),
|
// in-reply-to-status-id set),
|
||||||
// 2) directed-at a user (then it only has in-reply-to-user-id set),
|
// 2) directed-at a user (then it only has in-reply-to-user-id set),
|
||||||
// 3) not a reply at all.
|
// 3) not a reply at all.
|
||||||
// Additionally, it may or may not be a retweet (if it is, then it has retweet-user-id and
|
// Additionally, it may or may not be a Retweet (if it is, then it has retweet-user-id and
|
||||||
// retweet-status-id set).
|
// retweet-status-id set).
|
||||||
//
|
//
|
||||||
// We want to set some fields unconditionally, and some fields (reference-author-id and
|
// We want to set some fields unconditionally, and some fields (reference-author-id and
|
||||||
|
@ -22,13 +22,13 @@ import static com.twitter.search.modeling.tweet_ranking.TweetScoringFeatures.Fea
|
|||||||
/**
|
/**
|
||||||
* Loads the scoring models for tweets and provides access to them.
|
* Loads the scoring models for tweets and provides access to them.
|
||||||
*
|
*
|
||||||
* This class relies on a list ModelLoader objects to retrieve the objects from them. It will
|
* This class relies on a list of ModelLoader objects to retrieve the objects from them. It will
|
||||||
* return the first model found according to the order in the list.
|
* return the first model found according to the order in the list.
|
||||||
*
|
*
|
||||||
* For production, we load models from 2 sources: classpath and HDFS. If a model is available
|
* For production, we load models from 2 sources: classpath and HDFS. If a model is available
|
||||||
* from HDFS, we return it, otherwise we use the model from the classpath.
|
* from HDFS, we return it, otherwise we use the model from the classpath.
|
||||||
*
|
*
|
||||||
* The models used in for default requests (i.e. not experiments) MUST be present in the
|
* The models used for default requests (i.e. not experiments) MUST be present in the
|
||||||
* classpath, this allows us to avoid errors if they can't be loaded from HDFS.
|
* classpath, this allows us to avoid errors if they can't be loaded from HDFS.
|
||||||
* Models for experiments can live only in HDFS, so we don't need to redeploy Earlybird if we
|
* Models for experiments can live only in HDFS, so we don't need to redeploy Earlybird if we
|
||||||
* want to test them.
|
* want to test them.
|
||||||
|
@ -3,76 +3,81 @@ from twml.feature_config import FeatureConfigBuilder
|
|||||||
|
|
||||||
|
|
||||||
def get_feature_config(data_spec_path, label):
|
def get_feature_config(data_spec_path, label):
|
||||||
return FeatureConfigBuilder(data_spec_path=data_spec_path, debug=True) \
|
return (
|
||||||
|
FeatureConfigBuilder(data_spec_path=data_spec_path, debug=True)
|
||||||
.batch_add_features(
|
.batch_add_features(
|
||||||
[
|
[
|
||||||
("ebd.author_specific_score", "A"),
|
("ebd.author_specific_score", "A"),
|
||||||
("ebd.has_diff_lang", "A"),
|
("ebd.has_diff_lang", "A"),
|
||||||
("ebd.has_english_tweet_diff_ui_lang", "A"),
|
("ebd.has_english_tweet_diff_ui_lang", "A"),
|
||||||
("ebd.has_english_ui_diff_tweet_lang", "A"),
|
("ebd.has_english_ui_diff_tweet_lang", "A"),
|
||||||
("ebd.is_self_tweet", "A"),
|
("ebd.is_self_tweet", "A"),
|
||||||
("ebd.tweet_age_in_secs", "A"),
|
("ebd.tweet_age_in_secs", "A"),
|
||||||
("encoded_tweet_features.favorite_count", "A"),
|
("encoded_tweet_features.favorite_count", "A"),
|
||||||
("encoded_tweet_features.from_verified_account_flag", "A"),
|
("encoded_tweet_features.from_verified_account_flag", "A"),
|
||||||
("encoded_tweet_features.has_card_flag", "A"),
|
("encoded_tweet_features.has_card_flag", "A"),
|
||||||
# ("encoded_tweet_features.has_consumer_video_flag", "A"),
|
# ("encoded_tweet_features.has_consumer_video_flag", "A"),
|
||||||
("encoded_tweet_features.has_image_url_flag", "A"),
|
("encoded_tweet_features.has_image_url_flag", "A"),
|
||||||
("encoded_tweet_features.has_link_flag", "A"),
|
("encoded_tweet_features.has_link_flag", "A"),
|
||||||
("encoded_tweet_features.has_multiple_hashtags_or_trends_flag", "A"),
|
("encoded_tweet_features.has_multiple_hashtags_or_trends_flag", "A"),
|
||||||
# ("encoded_tweet_features.has_multiple_media_flag", "A"),
|
# ("encoded_tweet_features.has_multiple_media_flag", "A"),
|
||||||
("encoded_tweet_features.has_native_image_flag", "A"),
|
("encoded_tweet_features.has_native_image_flag", "A"),
|
||||||
("encoded_tweet_features.has_news_url_flag", "A"),
|
("encoded_tweet_features.has_news_url_flag", "A"),
|
||||||
("encoded_tweet_features.has_periscope_flag", "A"),
|
("encoded_tweet_features.has_periscope_flag", "A"),
|
||||||
("encoded_tweet_features.has_pro_video_flag", "A"),
|
("encoded_tweet_features.has_pro_video_flag", "A"),
|
||||||
("encoded_tweet_features.has_quote_flag", "A"),
|
("encoded_tweet_features.has_quote_flag", "A"),
|
||||||
("encoded_tweet_features.has_trend_flag", "A"),
|
("encoded_tweet_features.has_trend_flag", "A"),
|
||||||
("encoded_tweet_features.has_video_url_flag", "A"),
|
("encoded_tweet_features.has_video_url_flag", "A"),
|
||||||
("encoded_tweet_features.has_vine_flag", "A"),
|
("encoded_tweet_features.has_vine_flag", "A"),
|
||||||
("encoded_tweet_features.has_visible_link_flag", "A"),
|
("encoded_tweet_features.has_visible_link_flag", "A"),
|
||||||
("encoded_tweet_features.is_offensive_flag", "A"),
|
("encoded_tweet_features.is_offensive_flag", "A"),
|
||||||
("encoded_tweet_features.is_reply_flag", "A"),
|
("encoded_tweet_features.is_reply_flag", "A"),
|
||||||
("encoded_tweet_features.is_retweet_flag", "A"),
|
("encoded_tweet_features.is_retweet_flag", "A"),
|
||||||
("encoded_tweet_features.is_sensitive_content", "A"),
|
("encoded_tweet_features.is_sensitive_content", "A"),
|
||||||
# ("encoded_tweet_features.is_user_new_flag", "A"),
|
# ("encoded_tweet_features.is_user_new_flag", "A"),
|
||||||
("encoded_tweet_features.language", "A"),
|
("encoded_tweet_features.language", "A"),
|
||||||
("encoded_tweet_features.link_language", "A"),
|
("encoded_tweet_features.link_language", "A"),
|
||||||
("encoded_tweet_features.num_hashtags", "A"),
|
("encoded_tweet_features.num_hashtags", "A"),
|
||||||
("encoded_tweet_features.num_mentions", "A"),
|
("encoded_tweet_features.num_mentions", "A"),
|
||||||
# ("encoded_tweet_features.profile_is_egg_flag", "A"),
|
# ("encoded_tweet_features.profile_is_egg_flag", "A"),
|
||||||
("encoded_tweet_features.reply_count", "A"),
|
("encoded_tweet_features.reply_count", "A"),
|
||||||
("encoded_tweet_features.retweet_count", "A"),
|
("encoded_tweet_features.retweet_count", "A"),
|
||||||
("encoded_tweet_features.text_score", "A"),
|
("encoded_tweet_features.text_score", "A"),
|
||||||
("encoded_tweet_features.user_reputation", "A"),
|
("encoded_tweet_features.user_reputation", "A"),
|
||||||
("extended_encoded_tweet_features.embeds_impression_count", "A"),
|
("extended_encoded_tweet_features.embeds_impression_count", "A"),
|
||||||
("extended_encoded_tweet_features.embeds_impression_count_v2", "A"),
|
("extended_encoded_tweet_features.embeds_impression_count_v2", "A"),
|
||||||
("extended_encoded_tweet_features.embeds_url_count", "A"),
|
("extended_encoded_tweet_features.embeds_url_count", "A"),
|
||||||
("extended_encoded_tweet_features.embeds_url_count_v2", "A"),
|
("extended_encoded_tweet_features.embeds_url_count_v2", "A"),
|
||||||
("extended_encoded_tweet_features.favorite_count_v2", "A"),
|
("extended_encoded_tweet_features.favorite_count_v2", "A"),
|
||||||
("extended_encoded_tweet_features.label_abusive_hi_rcl_flag", "A"),
|
("extended_encoded_tweet_features.label_abusive_hi_rcl_flag", "A"),
|
||||||
("extended_encoded_tweet_features.label_dup_content_flag", "A"),
|
("extended_encoded_tweet_features.label_dup_content_flag", "A"),
|
||||||
("extended_encoded_tweet_features.label_nsfw_hi_prc_flag", "A"),
|
("extended_encoded_tweet_features.label_nsfw_hi_prc_flag", "A"),
|
||||||
("extended_encoded_tweet_features.label_nsfw_hi_rcl_flag", "A"),
|
("extended_encoded_tweet_features.label_nsfw_hi_rcl_flag", "A"),
|
||||||
("extended_encoded_tweet_features.label_spam_flag", "A"),
|
("extended_encoded_tweet_features.label_spam_flag", "A"),
|
||||||
("extended_encoded_tweet_features.label_spam_hi_rcl_flag", "A"),
|
("extended_encoded_tweet_features.label_spam_hi_rcl_flag", "A"),
|
||||||
("extended_encoded_tweet_features.quote_count", "A"),
|
("extended_encoded_tweet_features.quote_count", "A"),
|
||||||
("extended_encoded_tweet_features.reply_count_v2", "A"),
|
("extended_encoded_tweet_features.reply_count_v2", "A"),
|
||||||
("extended_encoded_tweet_features.retweet_count_v2", "A"),
|
("extended_encoded_tweet_features.retweet_count_v2", "A"),
|
||||||
("extended_encoded_tweet_features.weighted_favorite_count", "A"),
|
("extended_encoded_tweet_features.weighted_favorite_count", "A"),
|
||||||
("extended_encoded_tweet_features.weighted_quote_count", "A"),
|
("extended_encoded_tweet_features.weighted_quote_count", "A"),
|
||||||
("extended_encoded_tweet_features.weighted_reply_count", "A"),
|
("extended_encoded_tweet_features.weighted_reply_count", "A"),
|
||||||
("extended_encoded_tweet_features.weighted_retweet_count", "A"),
|
("extended_encoded_tweet_features.weighted_retweet_count", "A"),
|
||||||
]
|
]
|
||||||
).add_labels([
|
)
|
||||||
label, # Tensor index: 0
|
.add_labels(
|
||||||
"recap.engagement.is_clicked", # Tensor index: 1
|
[
|
||||||
"recap.engagement.is_favorited", # Tensor index: 2
|
label, # Tensor index: 0
|
||||||
"recap.engagement.is_open_linked", # Tensor index: 3
|
"recap.engagement.is_clicked", # Tensor index: 1
|
||||||
"recap.engagement.is_photo_expanded", # Tensor index: 4
|
"recap.engagement.is_favorited", # Tensor index: 2
|
||||||
"recap.engagement.is_profile_clicked", # Tensor index: 5
|
"recap.engagement.is_open_linked", # Tensor index: 3
|
||||||
"recap.engagement.is_replied", # Tensor index: 6
|
"recap.engagement.is_photo_expanded", # Tensor index: 4
|
||||||
"recap.engagement.is_retweeted", # Tensor index: 7
|
"recap.engagement.is_profile_clicked", # Tensor index: 5
|
||||||
"recap.engagement.is_video_playback_50", # Tensor index: 8
|
"recap.engagement.is_replied", # Tensor index: 6
|
||||||
"timelines.earlybird_score", # Tensor index: 9
|
"recap.engagement.is_retweeted", # Tensor index: 7
|
||||||
]) \
|
"recap.engagement.is_video_playback_50", # Tensor index: 8
|
||||||
.define_weight("meta.record_weight/type=earlybird") \
|
"timelines.earlybird_score", # Tensor index: 9
|
||||||
|
]
|
||||||
|
)
|
||||||
|
.define_weight("meta.record_weight/type=earlybird")
|
||||||
.build()
|
.build()
|
||||||
|
)
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
|
Tweepcred
|
||||||
|
|
||||||
Tweepcred is a social network analysis tool that calculates the influence of Twitter users based on their interactions with other users. The tool uses the PageRank algorithm to rank users based on their influence.
|
Tweepcred is a social network analysis tool that calculates the influence of Twitter users based on their interactions with other users. The tool uses the PageRank algorithm to rank users based on their influence.
|
||||||
|
|
||||||
PageRank Algorithm
|
PageRank Algorithm
|
||||||
|
@ -1,17 +1,17 @@
|
|||||||
# UserTweetEntityGraph (UTEG)
|
# UserTweetEntityGraph (UTEG)
|
||||||
|
|
||||||
## What is it
|
## What is it
|
||||||
User Tweet Entity Graph (UTEG) is a Finalge thrift service built on the GraphJet framework. In maintains a graph of user-tweet relationships and serves user recommendations based on traversals in this graph.
|
User Tweet Entity Graph (UTEG) is a Finalge thrift service built on the GraphJet framework. It maintains a graph of user-tweet relationships and serves user recommendations based on traversals in this graph.
|
||||||
|
|
||||||
## How is it used on Twitter
|
## How is it used on Twitter
|
||||||
UTEG generates the "XXX Liked" out-of-network tweets seen on Twitter's Home Timeline.
|
UTEG generates the "XXX Liked" out-of-network tweets seen on Twitter's Home Timeline.
|
||||||
The core idea behind UTEG is collaborative filtering. UTEG takes a user's weighted follow graph (i.e a list of weighted userIds) as input,
|
The core idea behind UTEG is collaborative filtering. UTEG takes a user's weighted follow graph (i.e a list of weighted userIds) as input,
|
||||||
performs efficient traversal & aggregation, and returns the top weighted tweets engaged basd on # of users that engaged the tweet, as well as
|
performs efficient traversal & aggregation, and returns the top-weighted tweets engaged based on # of users that engaged the tweet, as well as
|
||||||
the engaged users' weights.
|
the engaged users' weights.
|
||||||
|
|
||||||
UTEG is a stateful service and relies on a Kafka stream to ingest & persist states. It maintains an in-memory user engagements over the past
|
UTEG is a stateful service and relies on a Kafka stream to ingest & persist states. It maintains in-memory user engagements over the past
|
||||||
24-48 hours. Older events are dropped and GC'ed.
|
24-48 hours. Older events are dropped and GC'ed.
|
||||||
|
|
||||||
For full details on storage & processing, please check out our open-sourced project GraphJet, a general-purpose high performance in-memory storage engine.
|
For full details on storage & processing, please check out our open-sourced project GraphJet, a general-purpose high-performance in-memory storage engine.
|
||||||
- https://github.com/twitter/GraphJet
|
- https://github.com/twitter/GraphJet
|
||||||
- http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf
|
- http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf
|
||||||
|
@ -78,7 +78,7 @@ sealed trait SimClustersEmbedding extends Equals {
|
|||||||
CosineSimilarityUtil.applyNormArray(sortedScores, expScaledNorm)
|
CosineSimilarityUtil.applyNormArray(sortedScores, expScaledNorm)
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* The Standard Deviation of a Embedding.
|
* The Standard Deviation of an Embedding.
|
||||||
*/
|
*/
|
||||||
lazy val std: Double = {
|
lazy val std: Double = {
|
||||||
if (scores.isEmpty) {
|
if (scores.isEmpty) {
|
||||||
|
@ -306,7 +306,7 @@ struct ThriftFacetRankingOptions {
|
|||||||
// penalty for keyword stuffing
|
// penalty for keyword stuffing
|
||||||
60: optional i32 multipleHashtagsOrTrendsPenalty
|
60: optional i32 multipleHashtagsOrTrendsPenalty
|
||||||
|
|
||||||
// Langauge related boosts, similar to those in relevance ranking options. By default they are
|
// Language related boosts, similar to those in relevance ranking options. By default they are
|
||||||
// all 1.0 (no-boost).
|
// all 1.0 (no-boost).
|
||||||
// When the user language is english, facet language is not
|
// When the user language is english, facet language is not
|
||||||
11: optional double langEnglishUIBoost = 1.0
|
11: optional double langEnglishUIBoost = 1.0
|
||||||
|
@ -728,7 +728,7 @@ struct ThriftSearchResultMetadata {
|
|||||||
29: optional double parusScore
|
29: optional double parusScore
|
||||||
|
|
||||||
// Extra feature data, all new feature fields you want to return from Earlybird should go into
|
// Extra feature data, all new feature fields you want to return from Earlybird should go into
|
||||||
// this one, the outer one is always reaching its limit of the nubmer of fields JVM can
|
// this one, the outer one is always reaching its limit of the number of fields JVM can
|
||||||
// comfortably support!!
|
// comfortably support!!
|
||||||
86: optional ThriftSearchResultExtraMetadata extraMetadata
|
86: optional ThriftSearchResultExtraMetadata extraMetadata
|
||||||
|
|
||||||
@ -831,7 +831,7 @@ struct ThriftSearchResult {
|
|||||||
12: optional list<hits.ThriftHits> cardTitleHitHighlights
|
12: optional list<hits.ThriftHits> cardTitleHitHighlights
|
||||||
13: optional list<hits.ThriftHits> cardDescriptionHitHighlights
|
13: optional list<hits.ThriftHits> cardDescriptionHitHighlights
|
||||||
|
|
||||||
// Expansion types, if expandResult == False, the expasions set should be ignored.
|
// Expansion types, if expandResult == False, the expansions set should be ignored.
|
||||||
8: optional bool expandResult = 0
|
8: optional bool expandResult = 0
|
||||||
9: optional set<expansions.ThriftTweetExpansionType> expansions
|
9: optional set<expansions.ThriftTweetExpansionType> expansions
|
||||||
|
|
||||||
@ -971,7 +971,7 @@ struct ThriftTermStatisticsResults {
|
|||||||
// The binIds will correspond to the times of the hits matching the driving search query for this
|
// The binIds will correspond to the times of the hits matching the driving search query for this
|
||||||
// term statistics request.
|
// term statistics request.
|
||||||
// If there were no hits matching the search query, numBins binIds will be returned, but the
|
// If there were no hits matching the search query, numBins binIds will be returned, but the
|
||||||
// values of the binIds will not meaninfully correspond to anything related to the query, and
|
// values of the binIds will not meaningfully correspond to anything related to the query, and
|
||||||
// should not be used. Such cases can be identified by ThriftSearchResults.numHitsProcessed being
|
// should not be used. Such cases can be identified by ThriftSearchResults.numHitsProcessed being
|
||||||
// set to 0 in the response, and the response not being early terminated.
|
// set to 0 in the response, and the response not being early terminated.
|
||||||
3: optional list<i32> binIds
|
3: optional list<i32> binIds
|
||||||
@ -1097,8 +1097,8 @@ struct ThriftSearchResults {
|
|||||||
// Superroots' schema merge/choose logic when returning results to clients:
|
// Superroots' schema merge/choose logic when returning results to clients:
|
||||||
// . pick the schema based on the order of: realtime > protected > archive
|
// . pick the schema based on the order of: realtime > protected > archive
|
||||||
// . because of the above ordering, it is possible that archive earlybird schema with a new flush
|
// . because of the above ordering, it is possible that archive earlybird schema with a new flush
|
||||||
// verion (with new bit features) might be lost to older realtime earlybird schema; this is
|
// version (with new bit features) might be lost to older realtime earlybird schema; this is
|
||||||
// considered to to be rare and accetable because one realtime earlybird deploy would fix it
|
// considered to to be rare and acceptable because one realtime earlybird deploy would fix it
|
||||||
21: optional features.ThriftSearchFeatureSchema featureSchema
|
21: optional features.ThriftSearchFeatureSchema featureSchema
|
||||||
|
|
||||||
// How long it took to score the results in earlybird (in nanoseconds). The number of results
|
// How long it took to score the results in earlybird (in nanoseconds). The number of results
|
||||||
|
@ -29,8 +29,8 @@ struct AdhocSingleSideClusterScores {
|
|||||||
* we implement will use search abuse reports and impressions. We can build stores for new values
|
* we implement will use search abuse reports and impressions. We can build stores for new values
|
||||||
* in the future.
|
* in the future.
|
||||||
*
|
*
|
||||||
* The consumer creates the interactions which the author recieves. For instance, the consumer
|
* The consumer creates the interactions which the author receives. For instance, the consumer
|
||||||
* creates an abuse report for an author. The consumer scores are related to the interation creation
|
* creates an abuse report for an author. The consumer scores are related to the interaction creation
|
||||||
* behavior of the consumer. The author scores are related to the whether the author receives these
|
* behavior of the consumer. The author scores are related to the whether the author receives these
|
||||||
* interactions.
|
* interactions.
|
||||||
*
|
*
|
||||||
|
@ -70,7 +70,7 @@ struct TweetTopKTweetsWithScore {
|
|||||||
/**
|
/**
|
||||||
* The generic SimClustersEmbedding for online long-term storage and real-time calculation.
|
* The generic SimClustersEmbedding for online long-term storage and real-time calculation.
|
||||||
* Use SimClustersEmbeddingId as the only identifier.
|
* Use SimClustersEmbeddingId as the only identifier.
|
||||||
* Warning: Doesn't include modelversion and embedding type in the value struct.
|
* Warning: Doesn't include model version and embedding type in the value struct.
|
||||||
**/
|
**/
|
||||||
struct SimClustersEmbedding {
|
struct SimClustersEmbedding {
|
||||||
1: required list<SimClusterWithScore> embedding
|
1: required list<SimClusterWithScore> embedding
|
||||||
|
@ -50,7 +50,7 @@ struct CandidateTweets {
|
|||||||
}(hasPersonalData = 'true')
|
}(hasPersonalData = 'true')
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* An encapuslated collection of reference tweets
|
* An encapsulated collection of reference tweets
|
||||||
**/
|
**/
|
||||||
struct ReferenceTweets {
|
struct ReferenceTweets {
|
||||||
1: required i64 targetUserId(personalDataType = 'UserId')
|
1: required i64 targetUserId(personalDataType = 'UserId')
|
||||||
|
@ -33,12 +33,12 @@ enum EmbeddingType {
|
|||||||
Pop10000RankDecay11Tweet = 31,
|
Pop10000RankDecay11Tweet = 31,
|
||||||
OonPop1000RankDecayTweet = 32,
|
OonPop1000RankDecayTweet = 32,
|
||||||
|
|
||||||
// [Experimental] Offline generated produciton-like LogFavScore-based Tweet Embedding
|
// [Experimental] Offline generated production-like LogFavScore-based Tweet Embedding
|
||||||
OfflineGeneratedLogFavBasedTweet = 40,
|
OfflineGeneratedLogFavBasedTweet = 40,
|
||||||
|
|
||||||
// Reserve 51-59 for Ads Embedding
|
// Reserve 51-59 for Ads Embedding
|
||||||
LogFavBasedAdsTweet = 51, // Experimenal embedding for ads tweet candidate
|
LogFavBasedAdsTweet = 51, // Experimental embedding for ads tweet candidate
|
||||||
LogFavClickBasedAdsTweet = 52, // Experimenal embedding for ads tweet candidate
|
LogFavClickBasedAdsTweet = 52, // Experimental embedding for ads tweet candidate
|
||||||
|
|
||||||
// Reserve 60-69 for Evergreen content
|
// Reserve 60-69 for Evergreen content
|
||||||
LogFavBasedEvergreenTweet = 60,
|
LogFavBasedEvergreenTweet = 60,
|
||||||
@ -104,7 +104,7 @@ enum EmbeddingType {
|
|||||||
//Reserved 401 - 500 for Space embedding
|
//Reserved 401 - 500 for Space embedding
|
||||||
FavBasedApeSpace = 401 // DEPRECATED
|
FavBasedApeSpace = 401 // DEPRECATED
|
||||||
LogFavBasedListenerSpace = 402 // DEPRECATED
|
LogFavBasedListenerSpace = 402 // DEPRECATED
|
||||||
LogFavBasedAPESpeakerSpace = 403 // DEPRCATED
|
LogFavBasedAPESpeakerSpace = 403 // DEPRECATED
|
||||||
LogFavBasedUserInterestedInListenerSpace = 404 // DEPRECATED
|
LogFavBasedUserInterestedInListenerSpace = 404 // DEPRECATED
|
||||||
|
|
||||||
// Experimental, internal-only IDs
|
// Experimental, internal-only IDs
|
||||||
|
@ -1,36 +1,13 @@
|
|||||||
Overview
|
# TimelineRanker
|
||||||
========
|
|
||||||
|
|
||||||
**TimelineRanker** (TLR) is a legacy service which provides relevance-scored tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service. Despite its name, it no longer does any kind of heavy ranking/model based ranking itself - just uses relevance scores from the Search Index for ranked tweet endpoints.
|
|
||||||
|
|
||||||
|
**TimelineRanker** (TLR) is a legacy service that provides relevance-scored tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service. Despite its name, it no longer performs heavy ranking or model-based ranking itself; it only uses relevance scores from the Search Index for ranked tweet endpoints.
|
||||||
|
|
||||||
The following is a list of major services that Timeline Ranker interacts with:
|
The following is a list of major services that Timeline Ranker interacts with:
|
||||||
|
|
||||||
**Earlybird-root-superroot (a.k.a Search)**
|
- **Earlybird-root-superroot (a.k.a Search):** Timeline Ranker calls the Search Index's super root to fetch a list of Tweets.
|
||||||
|
- **User Tweet Entity Graph (UTEG):** Timeline Ranker calls UTEG to fetch a list of tweets liked by the users you follow.
|
||||||
Timeline Ranker calls the Search Index's super root to fetch a list of Tweets.
|
- **Socialgraph:** Timeline Ranker calls Social Graph Service to obtain the follow graph and user states such as blocked, muted, retweets muted, etc.
|
||||||
|
- **TweetyPie:** Timeline Ranker hydrates tweets by calling TweetyPie to post-filter tweets based on certain hydrated fields.
|
||||||
**User Tweet Entity Graph (UTEG)**
|
- **Manhattan:** Timeline Ranker hydrates some tweet features (e.g., user languages) from Manhattan.
|
||||||
|
|
||||||
Timeline Ranker calls UTEG to fetch a list of tweets liked by the users you follow.
|
|
||||||
|
|
||||||
**Socialgraph**
|
|
||||||
|
|
||||||
Timeline Ranker calls Social Graph Service to obtain follow graph and user states such as blocked, muted, retweets muted, etc.
|
|
||||||
|
|
||||||
**TweetyPie**
|
|
||||||
|
|
||||||
Timeline Ranker hydrates tweets by calling TweetyPie so that it can post-filter tweets based on certain hydrated fields.
|
|
||||||
|
|
||||||
**Manhattan**
|
|
||||||
|
|
||||||
Timeline Ranker hydrates some tweet features (eg, user languages) from Manhattan.
|
|
||||||
|
|
||||||
**Home Mixer**
|
|
||||||
|
|
||||||
Home Mixer calls Timeline Ranker to fetch tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service to power both the For You and Following Home Timelines.
|
|
||||||
|
|
||||||
Timeline Ranker does light ranking based on Earlybird tweet candidate scores and truncates to the number of candidates requested by Home Mixer based on these scores
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**Home Mixer** calls Timeline Ranker to fetch tweets from the Earlybird Search Index and User Tweet Entity Graph (UTEG) service to power both the For You and Following Home Timelines. Timeline Ranker performs light ranking based on Earlybird tweet candidate scores and truncates to the number of candidates requested by Home Mixer based on these scores.
|
||||||
|
@ -3,8 +3,8 @@ Trust and Safety Models
|
|||||||
|
|
||||||
We decided to open source the training code of the following models:
|
We decided to open source the training code of the following models:
|
||||||
- pNSFWMedia: Model to detect tweets with NSFW images. This includes adult and porn content.
|
- pNSFWMedia: Model to detect tweets with NSFW images. This includes adult and porn content.
|
||||||
- pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics
|
- pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics.
|
||||||
- pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter terms of service
|
- pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter's terms of service.
|
||||||
- pAbuse: Model to detect abusive content. This includes violations of Twitter terms of service, including hate speech, targeted harassment and abusive behavior.
|
- pAbuse: Model to detect abusive content. This includes violations of Twitter's terms of service, including hate speech, targeted harassment and abusive behavior.
|
||||||
|
|
||||||
We have several more models and rules that we are not going to open source at this time because of the adversarial nature of this area. The team is considering open sourcing more models going forward and will keep the community posted accordingly.
|
We have several more models and rules that we are not going to open source at this time because of the adversarial nature of this area. The team is considering open sourcing more models going forward and will keep the community posted accordingly.
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
# TWML
|
# TWML
|
||||||
|
|
||||||
---
|
---
|
||||||
Note: `twml` is no longer under development. Much of the code here is not out of date and unused.
|
Note: `twml` is no longer under development. Much of the code here is out of date and unused.
|
||||||
It is included here for completeness, because `twml` is still used to train the light ranker models
|
It is included here for completeness, because `twml` is still used to train the light ranker models
|
||||||
(see `src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md`)
|
(see `src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md`)
|
||||||
---
|
---
|
||||||
|
@ -494,6 +494,9 @@ visibility_library_enable_trends_representative_tweet_safety_level:
|
|||||||
visibility_library_enable_trusted_friends_user_list_safety_level:
|
visibility_library_enable_trusted_friends_user_list_safety_level:
|
||||||
default_availability: 10000
|
default_availability: 10000
|
||||||
|
|
||||||
|
visibility_library_enable_twitter_delegate_user_list_safety_level:
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
visibility_library_enable_tweet_detail_safety_level:
|
visibility_library_enable_tweet_detail_safety_level:
|
||||||
default_availability: 10000
|
default_availability: 10000
|
||||||
|
|
||||||
@ -758,7 +761,7 @@ visibility_library_enable_short_circuiting_from_blender_visibility_library:
|
|||||||
visibility_library_enable_short_circuiting_from_search_visibility_library:
|
visibility_library_enable_short_circuiting_from_search_visibility_library:
|
||||||
default_availability: 0
|
default_availability: 0
|
||||||
|
|
||||||
visibility_library_enable_nsfw_text_topics_drop_rule:
|
visibility_library_enable_nsfw_text_high_precision_drop_rule:
|
||||||
default_availability: 10000
|
default_availability: 10000
|
||||||
|
|
||||||
visibility_library_enable_spammy_tweet_rule_verdict_logging:
|
visibility_library_enable_spammy_tweet_rule_verdict_logging:
|
||||||
|
@ -535,6 +535,9 @@ private[visibility] object DeciderKey extends DeciderKeyEnum {
|
|||||||
val EnableTrustedFriendsUserListSafetyLevel: Value = Value(
|
val EnableTrustedFriendsUserListSafetyLevel: Value = Value(
|
||||||
"visibility_library_enable_trusted_friends_user_list_safety_level"
|
"visibility_library_enable_trusted_friends_user_list_safety_level"
|
||||||
)
|
)
|
||||||
|
val EnableTwitterDelegateUserListSafetyLevel: Value = Value(
|
||||||
|
"visibility_library_enable_twitter_delegate_user_list_safety_level"
|
||||||
|
)
|
||||||
val EnableTweetDetailSafetyLevel: Value = Value(
|
val EnableTweetDetailSafetyLevel: Value = Value(
|
||||||
"visibility_library_enable_tweet_detail_safety_level"
|
"visibility_library_enable_tweet_detail_safety_level"
|
||||||
)
|
)
|
||||||
@ -869,8 +872,8 @@ private[visibility] object DeciderKey extends DeciderKeyEnum {
|
|||||||
"visibility_library_enable_short_circuiting_from_search_visibility_library"
|
"visibility_library_enable_short_circuiting_from_search_visibility_library"
|
||||||
)
|
)
|
||||||
|
|
||||||
val EnableNsfwTextTopicsDropRule: Value = Value(
|
val EnableNsfwTextHighPrecisionDropRule: Value = Value(
|
||||||
"visibility_library_enable_nsfw_text_topics_drop_rule"
|
"visibility_library_enable_nsfw_text_high_precision_drop_rule"
|
||||||
)
|
)
|
||||||
|
|
||||||
val EnableSpammyTweetRuleVerdictLogging: Value = Value(
|
val EnableSpammyTweetRuleVerdictLogging: Value = Value(
|
||||||
|
@ -198,6 +198,7 @@ private[visibility] object VisibilityDeciders {
|
|||||||
TopicRecommendations -> DeciderKey.EnableTopicRecommendationsSafetyLevel,
|
TopicRecommendations -> DeciderKey.EnableTopicRecommendationsSafetyLevel,
|
||||||
TrendsRepresentativeTweet -> DeciderKey.EnableTrendsRepresentativeTweetSafetyLevel,
|
TrendsRepresentativeTweet -> DeciderKey.EnableTrendsRepresentativeTweetSafetyLevel,
|
||||||
TrustedFriendsUserList -> DeciderKey.EnableTrustedFriendsUserListSafetyLevel,
|
TrustedFriendsUserList -> DeciderKey.EnableTrustedFriendsUserListSafetyLevel,
|
||||||
|
TwitterDelegateUserList -> DeciderKey.EnableTwitterDelegateUserListSafetyLevel,
|
||||||
TweetDetail -> DeciderKey.EnableTweetDetailSafetyLevel,
|
TweetDetail -> DeciderKey.EnableTweetDetailSafetyLevel,
|
||||||
TweetDetailNonToo -> DeciderKey.EnableTweetDetailNonTooSafetyLevel,
|
TweetDetailNonToo -> DeciderKey.EnableTweetDetailNonTooSafetyLevel,
|
||||||
TweetEngagers -> DeciderKey.EnableTweetEngagersSafetyLevel,
|
TweetEngagers -> DeciderKey.EnableTweetEngagersSafetyLevel,
|
||||||
@ -287,7 +288,7 @@ private[visibility] object VisibilityDeciders {
|
|||||||
RuleParams.EnableDropAllTrustedFriendsTweetsRuleParam -> DeciderKey.EnableDropAllTrustedFriendsTweetsRule,
|
RuleParams.EnableDropAllTrustedFriendsTweetsRuleParam -> DeciderKey.EnableDropAllTrustedFriendsTweetsRule,
|
||||||
RuleParams.EnableDropTrustedFriendsTweetContentRuleParam -> DeciderKey.EnableDropTrustedFriendsTweetContentRule,
|
RuleParams.EnableDropTrustedFriendsTweetContentRuleParam -> DeciderKey.EnableDropTrustedFriendsTweetContentRule,
|
||||||
RuleParams.EnableDropAllCollabInvitationTweetsRuleParam -> DeciderKey.EnableDropCollabInvitationTweetsRule,
|
RuleParams.EnableDropAllCollabInvitationTweetsRuleParam -> DeciderKey.EnableDropCollabInvitationTweetsRule,
|
||||||
RuleParams.EnableNsfwTextTopicsDropRuleParam -> DeciderKey.EnableNsfwTextTopicsDropRule,
|
RuleParams.EnableNsfwTextHighPrecisionDropRuleParam -> DeciderKey.EnableNsfwTextHighPrecisionDropRule,
|
||||||
RuleParams.EnableLikelyIvsUserLabelDropRule -> DeciderKey.EnableLikelyIvsUserLabelDropRule,
|
RuleParams.EnableLikelyIvsUserLabelDropRule -> DeciderKey.EnableLikelyIvsUserLabelDropRule,
|
||||||
RuleParams.EnableCardUriRootDomainCardDenylistRule -> DeciderKey.EnableCardUriRootDomainDenylistRule,
|
RuleParams.EnableCardUriRootDomainCardDenylistRule -> DeciderKey.EnableCardUriRootDomainDenylistRule,
|
||||||
RuleParams.EnableCommunityNonMemberPollCardRule -> DeciderKey.EnableCommunityNonMemberPollCardRule,
|
RuleParams.EnableCommunityNonMemberPollCardRule -> DeciderKey.EnableCommunityNonMemberPollCardRule,
|
||||||
|
@ -85,7 +85,7 @@ private[visibility] object RuleParams {
|
|||||||
|
|
||||||
object EnableDropAllCollabInvitationTweetsRuleParam extends RuleParam(false)
|
object EnableDropAllCollabInvitationTweetsRuleParam extends RuleParam(false)
|
||||||
|
|
||||||
object EnableNsfwTextTopicsDropRuleParam extends RuleParam(false)
|
object EnableNsfwTextHighPrecisionDropRuleParam extends RuleParam(false)
|
||||||
|
|
||||||
object EnableLikelyIvsUserLabelDropRule extends RuleParam(false)
|
object EnableLikelyIvsUserLabelDropRule extends RuleParam(false)
|
||||||
|
|
||||||
|
@ -186,6 +186,7 @@ private[visibility] object SafetyLevelParams {
|
|||||||
object EnableTopicRecommendationsSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTopicRecommendationsSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
object EnableTrendsRepresentativeTweetSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTrendsRepresentativeTweetSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
object EnableTrustedFriendsUserListSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTrustedFriendsUserListSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
|
object EnableTwitterDelegateUserListSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
object EnableTweetDetailSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTweetDetailSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
object EnableTweetDetailNonTooSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTweetDetailNonTooSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
object EnableTweetDetailWithInjectionsHydrationSafetyLevelParam extends SafetyLevelParam(false)
|
object EnableTweetDetailWithInjectionsHydrationSafetyLevelParam extends SafetyLevelParam(false)
|
||||||
|
@ -143,7 +143,7 @@ class VisibilityRuleEngine private[VisibilityRuleEngine] (
|
|||||||
builder.withRuleResult(rule, RuleResult(builder.verdict, ShortCircuited))
|
builder.withRuleResult(rule, RuleResult(builder.verdict, ShortCircuited))
|
||||||
} else {
|
} else {
|
||||||
|
|
||||||
if (rule.fallbackActionBuilder.nonEmpty) {
|
if (failedFeatureDependencies.nonEmpty && rule.fallbackActionBuilder.nonEmpty) {
|
||||||
metricsRecorder.recordRuleFallbackAction(rule.name)
|
metricsRecorder.recordRuleFallbackAction(rule.name)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -194,6 +194,7 @@ object SafetyLevel {
|
|||||||
ThriftSafetyLevel.TopicsLandingPageTopicRecommendations -> TopicsLandingPageTopicRecommendations,
|
ThriftSafetyLevel.TopicsLandingPageTopicRecommendations -> TopicsLandingPageTopicRecommendations,
|
||||||
ThriftSafetyLevel.TrendsRepresentativeTweet -> TrendsRepresentativeTweet,
|
ThriftSafetyLevel.TrendsRepresentativeTweet -> TrendsRepresentativeTweet,
|
||||||
ThriftSafetyLevel.TrustedFriendsUserList -> TrustedFriendsUserList,
|
ThriftSafetyLevel.TrustedFriendsUserList -> TrustedFriendsUserList,
|
||||||
|
ThriftSafetyLevel.TwitterDelegateUserList -> TwitterDelegateUserList,
|
||||||
ThriftSafetyLevel.GryphonDecksAndColumns -> GryphonDecksAndColumns,
|
ThriftSafetyLevel.GryphonDecksAndColumns -> GryphonDecksAndColumns,
|
||||||
ThriftSafetyLevel.TweetDetail -> TweetDetail,
|
ThriftSafetyLevel.TweetDetail -> TweetDetail,
|
||||||
ThriftSafetyLevel.TweetDetailNonToo -> TweetDetailNonToo,
|
ThriftSafetyLevel.TweetDetailNonToo -> TweetDetailNonToo,
|
||||||
@ -772,6 +773,9 @@ object SafetyLevel {
|
|||||||
case object TrustedFriendsUserList extends SafetyLevel {
|
case object TrustedFriendsUserList extends SafetyLevel {
|
||||||
override val enabledParam: SafetyLevelParam = EnableTrustedFriendsUserListSafetyLevelParam
|
override val enabledParam: SafetyLevelParam = EnableTrustedFriendsUserListSafetyLevelParam
|
||||||
}
|
}
|
||||||
|
case object TwitterDelegateUserList extends SafetyLevel {
|
||||||
|
override val enabledParam: SafetyLevelParam = EnableTwitterDelegateUserListSafetyLevelParam
|
||||||
|
}
|
||||||
case object TweetDetail extends SafetyLevel {
|
case object TweetDetail extends SafetyLevel {
|
||||||
override val enabledParam: SafetyLevelParam = EnableTweetDetailSafetyLevelParam
|
override val enabledParam: SafetyLevelParam = EnableTweetDetailSafetyLevelParam
|
||||||
}
|
}
|
||||||
|
@ -379,13 +379,6 @@ object SafetyLevelGroup {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
case object ProfileMixer extends SafetyLevelGroup {
|
|
||||||
override val levels: Set[SafetyLevel] = Set(
|
|
||||||
ProfileMixerMedia,
|
|
||||||
ProfileMixerFavorites,
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
case object Reactions extends SafetyLevelGroup {
|
case object Reactions extends SafetyLevelGroup {
|
||||||
override val levels: Set[SafetyLevel] = Set(
|
override val levels: Set[SafetyLevel] = Set(
|
||||||
SignalsReactions,
|
SignalsReactions,
|
||||||
@ -516,6 +509,10 @@ object SafetyLevelGroup {
|
|||||||
SafetyLevel.TimelineProfile,
|
SafetyLevel.TimelineProfile,
|
||||||
TimelineProfileAll,
|
TimelineProfileAll,
|
||||||
TimelineProfileSpaces,
|
TimelineProfileSpaces,
|
||||||
|
TimelineMedia,
|
||||||
|
ProfileMixerMedia,
|
||||||
|
TimelineFavorites,
|
||||||
|
ProfileMixerFavorites
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -36,8 +36,8 @@ object SpaceSafetyLabelType extends SafetyLabelType {
|
|||||||
s.SpaceSafetyLabelType.HatefulHighRecall -> HatefulHighRecall,
|
s.SpaceSafetyLabelType.HatefulHighRecall -> HatefulHighRecall,
|
||||||
s.SpaceSafetyLabelType.ViolenceHighRecall -> ViolenceHighRecall,
|
s.SpaceSafetyLabelType.ViolenceHighRecall -> ViolenceHighRecall,
|
||||||
s.SpaceSafetyLabelType.HighToxicityModelScore -> HighToxicityModelScore,
|
s.SpaceSafetyLabelType.HighToxicityModelScore -> HighToxicityModelScore,
|
||||||
s.SpaceSafetyLabelType.UkraineCrisisTopic -> UkraineCrisisTopic,
|
s.SpaceSafetyLabelType.DeprecatedSpaceSafetyLabel14 -> Deprecated,
|
||||||
s.SpaceSafetyLabelType.DoNotPublicPublish -> DoNotPublicPublish,
|
s.SpaceSafetyLabelType.DeprecatedSpaceSafetyLabel15 -> Deprecated,
|
||||||
s.SpaceSafetyLabelType.Reserved16 -> Deprecated,
|
s.SpaceSafetyLabelType.Reserved16 -> Deprecated,
|
||||||
s.SpaceSafetyLabelType.Reserved17 -> Deprecated,
|
s.SpaceSafetyLabelType.Reserved17 -> Deprecated,
|
||||||
s.SpaceSafetyLabelType.Reserved18 -> Deprecated,
|
s.SpaceSafetyLabelType.Reserved18 -> Deprecated,
|
||||||
@ -69,10 +69,6 @@ object SpaceSafetyLabelType extends SafetyLabelType {
|
|||||||
case object ViolenceHighRecall extends SpaceSafetyLabelType
|
case object ViolenceHighRecall extends SpaceSafetyLabelType
|
||||||
case object HighToxicityModelScore extends SpaceSafetyLabelType
|
case object HighToxicityModelScore extends SpaceSafetyLabelType
|
||||||
|
|
||||||
case object UkraineCrisisTopic extends SpaceSafetyLabelType
|
|
||||||
|
|
||||||
case object DoNotPublicPublish extends SpaceSafetyLabelType
|
|
||||||
|
|
||||||
case object Deprecated extends SpaceSafetyLabelType
|
case object Deprecated extends SpaceSafetyLabelType
|
||||||
case object Unknown extends SpaceSafetyLabelType
|
case object Unknown extends SpaceSafetyLabelType
|
||||||
|
|
||||||
|
@ -3,6 +3,7 @@ package com.twitter.visibility.rules
|
|||||||
import com.twitter.spam.rtf.thriftscala.SafetyResultReason
|
import com.twitter.spam.rtf.thriftscala.SafetyResultReason
|
||||||
import com.twitter.util.Memoize
|
import com.twitter.util.Memoize
|
||||||
import com.twitter.visibility.common.actions.AppealableReason
|
import com.twitter.visibility.common.actions.AppealableReason
|
||||||
|
import com.twitter.visibility.common.actions.AvoidReason.MightNotBeSuitableForAds
|
||||||
import com.twitter.visibility.common.actions.LimitedEngagementReason
|
import com.twitter.visibility.common.actions.LimitedEngagementReason
|
||||||
import com.twitter.visibility.common.actions.SoftInterventionDisplayType
|
import com.twitter.visibility.common.actions.SoftInterventionDisplayType
|
||||||
import com.twitter.visibility.common.actions.SoftInterventionReason
|
import com.twitter.visibility.common.actions.SoftInterventionReason
|
||||||
@ -440,36 +441,6 @@ object FreedomOfSpeechNotReachActions {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
case class ConversationSectionAbusiveQualityAction(
|
|
||||||
violationLevel: ViolationLevel = DefaultViolationLevel)
|
|
||||||
extends FreedomOfSpeechNotReachActionBuilder[ConversationSectionAbusiveQuality.type] {
|
|
||||||
|
|
||||||
override def actionType: Class[_] = ConversationSectionAbusiveQuality.getClass
|
|
||||||
|
|
||||||
override val actionSeverity = 5
|
|
||||||
private def toRuleResult: Reason => RuleResult = Memoize { r =>
|
|
||||||
RuleResult(ConversationSectionAbusiveQuality, Evaluated)
|
|
||||||
}
|
|
||||||
|
|
||||||
def build(evaluationContext: EvaluationContext, featureMap: Map[Feature[_], _]): RuleResult = {
|
|
||||||
val appealableReason =
|
|
||||||
FreedomOfSpeechNotReach.extractTweetSafetyLabel(featureMap).map(_.labelType) match {
|
|
||||||
case Some(label) =>
|
|
||||||
FreedomOfSpeechNotReach.eligibleTweetSafetyLabelTypesToAppealableReason(
|
|
||||||
label,
|
|
||||||
violationLevel)
|
|
||||||
case _ =>
|
|
||||||
AppealableReason.Unspecified(violationLevel.level)
|
|
||||||
}
|
|
||||||
|
|
||||||
toRuleResult(Reason.fromAppealableReason(appealableReason))
|
|
||||||
}
|
|
||||||
|
|
||||||
override def withViolationLevel(violationLevel: ViolationLevel) = {
|
|
||||||
copy(violationLevel = violationLevel)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
case class SoftInterventionAvoidAction(violationLevel: ViolationLevel = DefaultViolationLevel)
|
case class SoftInterventionAvoidAction(violationLevel: ViolationLevel = DefaultViolationLevel)
|
||||||
extends FreedomOfSpeechNotReachActionBuilder[TweetInterstitial] {
|
extends FreedomOfSpeechNotReachActionBuilder[TweetInterstitial] {
|
||||||
|
|
||||||
@ -662,6 +633,9 @@ object FreedomOfSpeechNotReachRules {
|
|||||||
|
|
||||||
override def enabled: Seq[RuleParam[Boolean]] =
|
override def enabled: Seq[RuleParam[Boolean]] =
|
||||||
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
case class ViewerIsNonFollowerNonAuthorAndTweetHasViolationOfLevel(
|
case class ViewerIsNonFollowerNonAuthorAndTweetHasViolationOfLevel(
|
||||||
@ -678,6 +652,9 @@ object FreedomOfSpeechNotReachRules {
|
|||||||
|
|
||||||
override def enabled: Seq[RuleParam[Boolean]] =
|
override def enabled: Seq[RuleParam[Boolean]] =
|
||||||
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
case class ViewerIsNonAuthorAndTweetHasViolationOfLevel(
|
case class ViewerIsNonAuthorAndTweetHasViolationOfLevel(
|
||||||
@ -692,6 +669,9 @@ object FreedomOfSpeechNotReachRules {
|
|||||||
|
|
||||||
override def enabled: Seq[RuleParam[Boolean]] =
|
override def enabled: Seq[RuleParam[Boolean]] =
|
||||||
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
Seq(EnableFosnrRuleParam, FosnrRulesEnabledParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
case object TweetHasViolationOfAnyLevelFallbackDropRule
|
case object TweetHasViolationOfAnyLevelFallbackDropRule
|
||||||
|
@ -188,6 +188,7 @@ object RuleBase {
|
|||||||
TopicRecommendations -> TopicRecommendationsPolicy,
|
TopicRecommendations -> TopicRecommendationsPolicy,
|
||||||
TrendsRepresentativeTweet -> TrendsRepresentativeTweetPolicy,
|
TrendsRepresentativeTweet -> TrendsRepresentativeTweetPolicy,
|
||||||
TrustedFriendsUserList -> TrustedFriendsUserListPolicy,
|
TrustedFriendsUserList -> TrustedFriendsUserListPolicy,
|
||||||
|
TwitterDelegateUserList -> TwitterDelegateUserListPolicy,
|
||||||
TweetDetail -> TweetDetailPolicy,
|
TweetDetail -> TweetDetailPolicy,
|
||||||
TweetDetailNonToo -> TweetDetailNonTooPolicy,
|
TweetDetailNonToo -> TweetDetailNonTooPolicy,
|
||||||
TweetDetailWithInjectionsHydration -> TweetDetailWithInjectionsHydrationPolicy,
|
TweetDetailWithInjectionsHydration -> TweetDetailWithInjectionsHydrationPolicy,
|
||||||
|
@ -144,6 +144,9 @@ object NsfwCardImageAvoidAllUsersTweetLabelRule
|
|||||||
action = Avoid(Some(AvoidReason.ContainsNsfwMedia)),
|
action = Avoid(Some(AvoidReason.ContainsNsfwMedia)),
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object NsfwCardImageAvoidAdPlacementAllUsersTweetLabelRule
|
object NsfwCardImageAvoidAdPlacementAllUsersTweetLabelRule
|
||||||
@ -247,6 +250,9 @@ object GoreAndViolenceHighPrecisionAvoidAllUsersTweetLabelRule
|
|||||||
TweetSafetyLabelType.GoreAndViolenceHighPrecision
|
TweetSafetyLabelType.GoreAndViolenceHighPrecision
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object GoreAndViolenceHighPrecisionAllUsersTweetLabelRule
|
object GoreAndViolenceHighPrecisionAllUsersTweetLabelRule
|
||||||
@ -266,6 +272,9 @@ object NsfwReportedHeuristicsAvoidAllUsersTweetLabelRule
|
|||||||
TweetSafetyLabelType.NsfwReportedHeuristics
|
TweetSafetyLabelType.NsfwReportedHeuristics
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object NsfwReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
object NsfwReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
||||||
@ -274,6 +283,9 @@ object NsfwReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
|||||||
TweetSafetyLabelType.NsfwReportedHeuristics
|
TweetSafetyLabelType.NsfwReportedHeuristics
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object NsfwReportedHeuristicsAllUsersTweetLabelRule
|
object NsfwReportedHeuristicsAllUsersTweetLabelRule
|
||||||
@ -294,6 +306,9 @@ object GoreAndViolenceReportedHeuristicsAvoidAllUsersTweetLabelRule
|
|||||||
TweetSafetyLabelType.GoreAndViolenceReportedHeuristics
|
TweetSafetyLabelType.GoreAndViolenceReportedHeuristics
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object GoreAndViolenceReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
object GoreAndViolenceReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
||||||
@ -302,6 +317,9 @@ object GoreAndViolenceReportedHeuristicsAvoidAdPlacementAllUsersTweetLabelRule
|
|||||||
TweetSafetyLabelType.GoreAndViolenceReportedHeuristics
|
TweetSafetyLabelType.GoreAndViolenceReportedHeuristics
|
||||||
) {
|
) {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableAvoidNsfwRulesParam)
|
||||||
|
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
}
|
}
|
||||||
|
|
||||||
object GoreAndViolenceHighPrecisionAllUsersTweetLabelDropRule
|
object GoreAndViolenceHighPrecisionAllUsersTweetLabelDropRule
|
||||||
@ -791,7 +809,7 @@ object SkipTweetDetailLimitedEngagementTweetLabelRule
|
|||||||
object DynamicProductAdDropTweetLabelRule
|
object DynamicProductAdDropTweetLabelRule
|
||||||
extends TweetHasLabelRule(Drop(Unspecified), TweetSafetyLabelType.DynamicProductAd)
|
extends TweetHasLabelRule(Drop(Unspecified), TweetSafetyLabelType.DynamicProductAd)
|
||||||
|
|
||||||
object NsfwTextTweetLabelTopicsDropRule
|
object NsfwTextHighPrecisionTweetLabelDropRule
|
||||||
extends RuleWithConstantAction(
|
extends RuleWithConstantAction(
|
||||||
Drop(Reason.Nsfw),
|
Drop(Reason.Nsfw),
|
||||||
And(
|
And(
|
||||||
@ -803,7 +821,7 @@ object NsfwTextTweetLabelTopicsDropRule
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
with DoesLogVerdict {
|
with DoesLogVerdict {
|
||||||
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableNsfwTextTopicsDropRuleParam)
|
override def enabled: Seq[RuleParam[Boolean]] = Seq(EnableNsfwTextHighPrecisionDropRuleParam)
|
||||||
override def actionSourceBuilder: Option[RuleActionSourceBuilder] = Some(
|
override def actionSourceBuilder: Option[RuleActionSourceBuilder] = Some(
|
||||||
TweetSafetyLabelSourceBuilder(TweetSafetyLabelType.NsfwTextHighPrecision))
|
TweetSafetyLabelSourceBuilder(TweetSafetyLabelType.NsfwTextHighPrecision))
|
||||||
}
|
}
|
||||||
@ -832,7 +850,10 @@ object DoNotAmplifyTweetLabelAvoidRule
|
|||||||
extends TweetHasLabelRule(
|
extends TweetHasLabelRule(
|
||||||
Avoid(),
|
Avoid(),
|
||||||
TweetSafetyLabelType.DoNotAmplify
|
TweetSafetyLabelType.DoNotAmplify
|
||||||
)
|
) {
|
||||||
|
override val fallbackActionBuilder: Option[ActionBuilder[_ <: Action]] = Some(
|
||||||
|
new ConstantActionBuilder(Avoid(Some(MightNotBeSuitableForAds))))
|
||||||
|
}
|
||||||
|
|
||||||
object NsfaHighPrecisionTweetLabelAvoidRule
|
object NsfaHighPrecisionTweetLabelAvoidRule
|
||||||
extends TweetHasLabelRule(
|
extends TweetHasLabelRule(
|
||||||
|
@ -776,7 +776,10 @@ case object MagicRecsPolicy
|
|||||||
tweetRules = MagicRecsPolicyOverrides.union(
|
tweetRules = MagicRecsPolicyOverrides.union(
|
||||||
RecommendationsPolicy.tweetRules.filterNot(_ == SafetyCrisisLevel3DropRule),
|
RecommendationsPolicy.tweetRules.filterNot(_ == SafetyCrisisLevel3DropRule),
|
||||||
NotificationsIbisPolicy.tweetRules,
|
NotificationsIbisPolicy.tweetRules,
|
||||||
Seq(NsfaHighRecallTweetLabelRule, NsfwHighRecallTweetLabelRule),
|
Seq(
|
||||||
|
NsfaHighRecallTweetLabelRule,
|
||||||
|
NsfwHighRecallTweetLabelRule,
|
||||||
|
NsfwTextHighPrecisionTweetLabelDropRule),
|
||||||
Seq(
|
Seq(
|
||||||
AuthorBlocksViewerDropRule,
|
AuthorBlocksViewerDropRule,
|
||||||
ViewerBlocksAuthorRule,
|
ViewerBlocksAuthorRule,
|
||||||
@ -1171,7 +1174,7 @@ case object ReturningUserExperiencePolicy
|
|||||||
NsfwHighRecallTweetLabelRule,
|
NsfwHighRecallTweetLabelRule,
|
||||||
NsfwVideoTweetLabelDropRule,
|
NsfwVideoTweetLabelDropRule,
|
||||||
NsfwTextTweetLabelDropRule,
|
NsfwTextTweetLabelDropRule,
|
||||||
NsfwTextTweetLabelTopicsDropRule,
|
NsfwTextHighPrecisionTweetLabelDropRule,
|
||||||
SpamHighRecallTweetLabelDropRule,
|
SpamHighRecallTweetLabelDropRule,
|
||||||
DuplicateContentTweetLabelDropRule,
|
DuplicateContentTweetLabelDropRule,
|
||||||
GoreAndViolenceTweetLabelRule,
|
GoreAndViolenceTweetLabelRule,
|
||||||
@ -1785,6 +1788,14 @@ case object TimelineListsPolicy
|
|||||||
NsfwReportedHeuristicsAllUsersTweetLabelRule,
|
NsfwReportedHeuristicsAllUsersTweetLabelRule,
|
||||||
GoreAndViolenceReportedHeuristicsAllUsersTweetLabelRule,
|
GoreAndViolenceReportedHeuristicsAllUsersTweetLabelRule,
|
||||||
NsfwCardImageAllUsersTweetLabelRule,
|
NsfwCardImageAllUsersTweetLabelRule,
|
||||||
|
NsfwHighPrecisionTweetLabelAvoidRule,
|
||||||
|
NsfwHighRecallTweetLabelAvoidRule,
|
||||||
|
GoreAndViolenceHighPrecisionAvoidAllUsersTweetLabelRule,
|
||||||
|
NsfwReportedHeuristicsAvoidAllUsersTweetLabelRule,
|
||||||
|
GoreAndViolenceReportedHeuristicsAvoidAllUsersTweetLabelRule,
|
||||||
|
NsfwCardImageAvoidAllUsersTweetLabelRule,
|
||||||
|
DoNotAmplifyTweetLabelAvoidRule,
|
||||||
|
NsfaHighPrecisionTweetLabelAvoidRule,
|
||||||
) ++ LimitedEngagementBaseRules.tweetRules
|
) ++ LimitedEngagementBaseRules.tweetRules
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -2132,7 +2143,13 @@ case object TimelineHomePolicy
|
|||||||
userRules = Seq(
|
userRules = Seq(
|
||||||
ViewerMutesAuthorRule,
|
ViewerMutesAuthorRule,
|
||||||
ViewerBlocksAuthorRule,
|
ViewerBlocksAuthorRule,
|
||||||
DeciderableAuthorBlocksViewerDropRule
|
DeciderableAuthorBlocksViewerDropRule,
|
||||||
|
ProtectedAuthorDropRule,
|
||||||
|
SuspendedAuthorRule,
|
||||||
|
DeactivatedAuthorRule,
|
||||||
|
ErasedAuthorRule,
|
||||||
|
OffboardedAuthorRule,
|
||||||
|
DropTakendownUserRule
|
||||||
),
|
),
|
||||||
policyRuleParams = SensitiveMediaSettingsTimelineHomeBaseRules.policyRuleParams
|
policyRuleParams = SensitiveMediaSettingsTimelineHomeBaseRules.policyRuleParams
|
||||||
)
|
)
|
||||||
@ -2171,7 +2188,13 @@ case object BaseTimelineHomePolicy
|
|||||||
userRules = Seq(
|
userRules = Seq(
|
||||||
ViewerMutesAuthorRule,
|
ViewerMutesAuthorRule,
|
||||||
ViewerBlocksAuthorRule,
|
ViewerBlocksAuthorRule,
|
||||||
DeciderableAuthorBlocksViewerDropRule
|
DeciderableAuthorBlocksViewerDropRule,
|
||||||
|
ProtectedAuthorDropRule,
|
||||||
|
SuspendedAuthorRule,
|
||||||
|
DeactivatedAuthorRule,
|
||||||
|
ErasedAuthorRule,
|
||||||
|
OffboardedAuthorRule,
|
||||||
|
DropTakendownUserRule
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -2255,7 +2278,13 @@ case object TimelineHomeLatestPolicy
|
|||||||
userRules = Seq(
|
userRules = Seq(
|
||||||
ViewerMutesAuthorRule,
|
ViewerMutesAuthorRule,
|
||||||
ViewerBlocksAuthorRule,
|
ViewerBlocksAuthorRule,
|
||||||
DeciderableAuthorBlocksViewerDropRule
|
DeciderableAuthorBlocksViewerDropRule,
|
||||||
|
ProtectedAuthorDropRule,
|
||||||
|
SuspendedAuthorRule,
|
||||||
|
DeactivatedAuthorRule,
|
||||||
|
ErasedAuthorRule,
|
||||||
|
OffboardedAuthorRule,
|
||||||
|
DropTakendownUserRule
|
||||||
),
|
),
|
||||||
policyRuleParams = SensitiveMediaSettingsTimelineHomeBaseRules.policyRuleParams
|
policyRuleParams = SensitiveMediaSettingsTimelineHomeBaseRules.policyRuleParams
|
||||||
)
|
)
|
||||||
@ -3283,7 +3312,7 @@ case object TopicRecommendationsPolicy
|
|||||||
tweetRules =
|
tweetRules =
|
||||||
Seq(
|
Seq(
|
||||||
NsfwHighRecallTweetLabelRule,
|
NsfwHighRecallTweetLabelRule,
|
||||||
NsfwTextTweetLabelTopicsDropRule
|
NsfwTextHighPrecisionTweetLabelDropRule
|
||||||
)
|
)
|
||||||
++ RecommendationsPolicy.tweetRules,
|
++ RecommendationsPolicy.tweetRules,
|
||||||
userRules = RecommendationsPolicy.userRules
|
userRules = RecommendationsPolicy.userRules
|
||||||
@ -3536,6 +3565,17 @@ case object TrustedFriendsUserListPolicy
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
case object TwitterDelegateUserListPolicy
|
||||||
|
extends VisibilityPolicy(
|
||||||
|
userRules = Seq(
|
||||||
|
ViewerBlocksAuthorRule,
|
||||||
|
ViewerIsAuthorDropRule,
|
||||||
|
DeactivatedAuthorRule,
|
||||||
|
AuthorBlocksViewerDropRule
|
||||||
|
),
|
||||||
|
tweetRules = Seq(DropAllRule)
|
||||||
|
)
|
||||||
|
|
||||||
case object QuickPromoteTweetEligibilityPolicy
|
case object QuickPromoteTweetEligibilityPolicy
|
||||||
extends VisibilityPolicy(
|
extends VisibilityPolicy(
|
||||||
tweetRules = TweetDetailPolicy.tweetRules,
|
tweetRules = TweetDetailPolicy.tweetRules,
|
||||||
|
@ -100,30 +100,6 @@ object TweetRuleGenerator {
|
|||||||
FreedomOfSpeechNotReachActions.SoftInterventionAvoidLimitedEngagementsAction(
|
FreedomOfSpeechNotReachActions.SoftInterventionAvoidLimitedEngagementsAction(
|
||||||
limitedActionStrings = Some(level3LimitedActions))
|
limitedActionStrings = Some(level3LimitedActions))
|
||||||
)
|
)
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.TimelineMedia,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.SoftInterventionAvoidLimitedEngagementsAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.ProfileMixerMedia,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.SoftInterventionAvoidLimitedEngagementsAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.TimelineFavorites,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.SoftInterventionAvoidLimitedEngagementsAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.ProfileMixerFavorites,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.SoftInterventionAvoidLimitedEngagementsAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.build,
|
.build,
|
||||||
UserType.Author -> TweetVisibilityPolicy
|
UserType.Author -> TweetVisibilityPolicy
|
||||||
.builder()
|
.builder()
|
||||||
@ -159,30 +135,6 @@ object TweetRuleGenerator {
|
|||||||
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
||||||
Some(level3LimitedActions))
|
Some(level3LimitedActions))
|
||||||
)
|
)
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.TimelineMedia,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.ProfileMixerMedia,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.TimelineFavorites,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.addSafetyLevelRule(
|
|
||||||
SafetyLevel.ProfileMixerFavorites,
|
|
||||||
FreedomOfSpeechNotReachActions
|
|
||||||
.InterstitialLimitedEngagementsAvoidAction(limitedActionStrings =
|
|
||||||
Some(level3LimitedActions))
|
|
||||||
)
|
|
||||||
.build,
|
.build,
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user