mirror of
https://github.com/twitter/the-algorithm.git
synced 2025-01-05 09:01:54 +01:00
Compare commits
5 Commits
31e82d6474
...
90d7ea370e
Author | SHA1 | Date | |
---|---|---|---|
|
90d7ea370e | ||
|
5edbbeedb3 | ||
|
43cdcf2ed6 | ||
|
197bf2c563 | ||
|
b5e849b029 |
@ -18,8 +18,11 @@ Product surfaces at Twitter are built on a shared set of data, models, and softw
|
|||||||
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
|
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
|
||||||
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
|
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
|
||||||
| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. |
|
| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. |
|
||||||
|
| | [representation-scorer](representation-scorer/README.md) | Compute scores between pairs of entities (Users, Tweets, etc.) using embedding similarity. |
|
||||||
| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. |
|
| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. |
|
||||||
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
|
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
|
||||||
|
| | [timelines-aggregation-framework](timelines/data_processing/ml_util/aggregation_framework/README.md) | Framework for generating aggregate features in batch or real time. |
|
||||||
|
| | [representation-manager](representation-manager/README.md) | Service to retrieve embeddings (i.e. SimClusers and TwHIN). |
|
||||||
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
|
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
|
||||||
|
|
||||||
The product surface currently included in this repository is the For You Timeline.
|
The product surface currently included in this repository is the For You Timeline.
|
||||||
|
51
RETREIVAL_SIGNALS.md
Normal file
51
RETREIVAL_SIGNALS.md
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
# Signals for Candidate Sources
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The candidate sourcing stage within the Twitter Recommendation algorithm serves to significantly narrow down the item size from approximately 1 billion to just a few thousand. This process utilizes Twitter user behavior as the primary input for the algorithm. This document comprehensively enumerates all the signals during the candidate sourcing phase.
|
||||||
|
|
||||||
|
| Signals | Description |
|
||||||
|
| :-------------------- | :-------------------------------------------------------------------- |
|
||||||
|
| Author Follow | The accounts which user explicit follows. |
|
||||||
|
| Author Unfollow | The accounts which user recently unfollows. |
|
||||||
|
| Author Mute | The accounts which user have muted. |
|
||||||
|
| Author Block | The accounts which user have blocked |
|
||||||
|
| Tweet Favorite | The tweets which user clicked the like botton. |
|
||||||
|
| Tweet Unfavorite | The tweets which user clicked the unlike botton. |
|
||||||
|
| Retweet | The tweets which user retweeted |
|
||||||
|
| Quote Tweet | The tweets which user retweeted with comments. |
|
||||||
|
| Tweet Reply | The tweets which user replied. |
|
||||||
|
| Tweet Share | The tweets which user clicked the share botton. |
|
||||||
|
| Tweet Bookmark | The tweets which user clicked the bookmark botton. |
|
||||||
|
| Tweet Click | The tweets which user clicked and viewed the tweet detail page. |
|
||||||
|
| Tweet Video Watch | The video tweets which user watched certain seconds or percentage. |
|
||||||
|
| Tweet Don't like | The tweets which user clicked "Not interested in this tweet" botton. |
|
||||||
|
| Tweet Report | The tweets which user clicked "Report Tweet" botton. |
|
||||||
|
| Notification Open | The push notification tweets which user opened. |
|
||||||
|
| Ntab click | The tweets which user click on the Notifications page. |
|
||||||
|
| User AddressBook | The author accounts identifiers of the user's addressbook. |
|
||||||
|
|
||||||
|
## Usage Details
|
||||||
|
|
||||||
|
Twitter uses these user signals as training labels and/or ML features in the each candidate sourcing algorithms. The following tables shows how they are used in the each components.
|
||||||
|
|
||||||
|
| Signals | USS | SimClusters | TwHin | UTEG | FRS | Light Ranking |
|
||||||
|
| :-------------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- | :----------------- |
|
||||||
|
| Author Follow | Features | Features / Labels | Features / Labels | Features | Features / Labels | N/A |
|
||||||
|
| Author Unfollow | Features | N/A | N/A | N/A | N/A | N/A |
|
||||||
|
| Author Mute | Features | N/A | N/A | N/A | Features | N/A |
|
||||||
|
| Author Block | Features | N/A | N/A | N/A | Features | N/A |
|
||||||
|
| Tweet Favorite | Features | Features | Features / Labels | Features | Features / Labels | Features / Labels |
|
||||||
|
| Tweet Unfavorite | Features | Features | N/A | N/A | N/A | N/A |
|
||||||
|
| Retweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels |
|
||||||
|
| Quote Tweet | Features | N/A | Features / Labels | Features | Features / Labels | Features / Labels |
|
||||||
|
| Tweet Reply | Features | N/A | Features | Features | Features / Labels | Features |
|
||||||
|
| Tweet Share | Features | N/A | N/A | N/A | Features | N/A |
|
||||||
|
| Tweet Bookmark | Features | N/A | N/A | N/A | N/A | N/A |
|
||||||
|
| Tweet Click | Features | N/A | N/A | N/A | Features | Labels |
|
||||||
|
| Tweet Video Watch | Features | Features | N/A | N/A | N/A | Labels |
|
||||||
|
| Tweet Don't like | Features | N/A | N/A | N/A | N/A | N/A |
|
||||||
|
| Tweet Report | Features | N/A | N/A | N/A | N/A | N/A |
|
||||||
|
| Notification Open | Features | Features | Features | N/A | Features | N/A |
|
||||||
|
| Ntab click | Features | Features | Features | N/A | Features | N/A |
|
||||||
|
| User AddressBook | N/A | N/A | N/A | N/A | Features | N/A |
|
1
representation-manager/BUILD.bazel
Normal file
1
representation-manager/BUILD.bazel
Normal file
@ -0,0 +1 @@
|
|||||||
|
# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD
|
4
representation-manager/README.md
Normal file
4
representation-manager/README.md
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
# Representation Manager #
|
||||||
|
|
||||||
|
**Representation Manager** (RMS) serves as a centralized embedding management system, providing SimClusters or other embeddings as facade of the underlying storage or services.
|
||||||
|
|
4
representation-manager/bin/deploy.sh
Executable file
4
representation-manager/bin/deploy.sh
Executable file
@ -0,0 +1,4 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
JOB=representation-manager bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress \
|
||||||
|
//relevance-platform/src/main/python/deploy -- "$@"
|
@ -0,0 +1,17 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-thrift-client",
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato",
|
||||||
|
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
|
||||||
|
"relevance-platform/src/main/scala/com/twitter/relevance_platform/common/readablestore",
|
||||||
|
"representation-manager/client/src/main/scala/com/twitter/representation_manager/config",
|
||||||
|
"representation-manager/server/src/main/thrift:thrift-scala",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/common",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
"stitch/stitch-storehaus",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/client",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,208 @@
|
|||||||
|
package com.twitter.representation_manager
|
||||||
|
|
||||||
|
import com.twitter.finagle.memcached.{Client => MemcachedClient}
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.frigate.common.store.strato.StratoFetchableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedCachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.representation_manager.config.ClientConfig
|
||||||
|
import com.twitter.representation_manager.config.DisabledInMemoryCacheParams
|
||||||
|
import com.twitter.representation_manager.config.EnabledInMemoryCacheParams
|
||||||
|
import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.LocaleEntityId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.TopicId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import com.twitter.strato.thrift.ScroogeConvImplicits._
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is the class that offers features to build readable stores for a given
|
||||||
|
* SimClustersEmbeddingView (i.e. embeddingType and modelVersion). It applies ClientConfig
|
||||||
|
* for a particular service and build ReadableStores which implement that config.
|
||||||
|
*/
|
||||||
|
class StoreBuilder(
|
||||||
|
clientConfig: ClientConfig,
|
||||||
|
stratoClient: StratoClient,
|
||||||
|
memCachedClient: MemcachedClient,
|
||||||
|
globalStats: StatsReceiver,
|
||||||
|
) {
|
||||||
|
private val stats =
|
||||||
|
globalStats.scope("representation_manager_client").scope(this.getClass.getSimpleName)
|
||||||
|
|
||||||
|
// Column consts
|
||||||
|
private val ColPathPrefix = "recommendations/representation_manager/"
|
||||||
|
private val SimclustersTweetColPath = ColPathPrefix + "simClustersEmbedding.Tweet"
|
||||||
|
private val SimclustersUserColPath = ColPathPrefix + "simClustersEmbedding.User"
|
||||||
|
private val SimclustersTopicIdColPath = ColPathPrefix + "simClustersEmbedding.TopicId"
|
||||||
|
private val SimclustersLocaleEntityIdColPath =
|
||||||
|
ColPathPrefix + "simClustersEmbedding.LocaleEntityId"
|
||||||
|
|
||||||
|
def buildSimclustersTweetEmbeddingStore(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[Long, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersTweetColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
addCacheLayer(rawStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersUserEmbeddingStore(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[Long, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersUserColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
addCacheLayer(rawStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersTopicIdEmbeddingStore(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[TopicId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersTopicIdColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
addCacheLayer(rawStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersLocaleEntityIdEmbeddingStore(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[LocaleEntityId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[LocaleEntityId, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersLocaleEntityIdColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
addCacheLayer(rawStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersTweetEmbeddingStoreWithEmbeddingIdAsKey(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersTweetColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.TweetId(tweetId)) =>
|
||||||
|
tweetId
|
||||||
|
}
|
||||||
|
|
||||||
|
addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersUserEmbeddingStoreWithEmbeddingIdAsKey(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[Long, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersUserColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) =>
|
||||||
|
userId
|
||||||
|
}
|
||||||
|
|
||||||
|
addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersTopicEmbeddingStoreWithEmbeddingIdAsKey(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersTopicIdColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) =>
|
||||||
|
topicId
|
||||||
|
}
|
||||||
|
|
||||||
|
addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersTopicIdEmbeddingStoreWithEmbeddingIdAsKey(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[TopicId, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersTopicIdColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) =>
|
||||||
|
topicId
|
||||||
|
}
|
||||||
|
|
||||||
|
addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
def buildSimclustersLocaleEntityIdEmbeddingStoreWithEmbeddingIdAsKey(
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withView[LocaleEntityId, SimClustersEmbeddingView, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
SimclustersLocaleEntityIdColPath,
|
||||||
|
embeddingColumnView)
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
val embeddingIdAsKeyStore = rawStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.LocaleEntityId(localeEntityId)) =>
|
||||||
|
localeEntityId
|
||||||
|
}
|
||||||
|
|
||||||
|
addCacheLayer(embeddingIdAsKeyStore, embeddingColumnView)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def addCacheLayer[K](
|
||||||
|
rawStore: ReadableStore[K, SimClustersEmbedding],
|
||||||
|
embeddingColumnView: SimClustersEmbeddingView,
|
||||||
|
): ReadableStore[K, SimClustersEmbedding] = {
|
||||||
|
// Add in-memory caching based on ClientConfig
|
||||||
|
val inMemCacheParams = clientConfig.inMemoryCacheConfig
|
||||||
|
.getCacheSetup(embeddingColumnView.embeddingType, embeddingColumnView.modelVersion)
|
||||||
|
|
||||||
|
val statsPerStore = stats
|
||||||
|
.scope(embeddingColumnView.embeddingType.name).scope(embeddingColumnView.modelVersion.name)
|
||||||
|
|
||||||
|
inMemCacheParams match {
|
||||||
|
case DisabledInMemoryCacheParams =>
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(statsPerStore)
|
||||||
|
case EnabledInMemoryCacheParams(ttl, maxKeys, cacheName) =>
|
||||||
|
ObservedCachedReadableStore.from[K, SimClustersEmbedding](
|
||||||
|
rawStore,
|
||||||
|
ttl = ttl,
|
||||||
|
maxKeys = maxKeys,
|
||||||
|
cacheName = cacheName,
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsPerStore)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,12 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-thrift-client",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/common",
|
||||||
|
"representation-manager/server/src/main/thrift:thrift-scala",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/client",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,25 @@
|
|||||||
|
package com.twitter.representation_manager.config
|
||||||
|
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This is RMS client config class.
|
||||||
|
* We only support setting up in memory cache params for now, but we expect to enable other
|
||||||
|
* customisations in the near future e.g. request timeout
|
||||||
|
*
|
||||||
|
* --------------------------------------------
|
||||||
|
* PLEASE NOTE:
|
||||||
|
* Having in-memory cache is not necessarily a free performance win, anyone considering it should
|
||||||
|
* investigate rather than blindly enabling it
|
||||||
|
* */
|
||||||
|
class ClientConfig(inMemCacheParamsOverrides: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
InMemoryCacheParams
|
||||||
|
] = Map.empty) {
|
||||||
|
// In memory cache config per embedding
|
||||||
|
val inMemCacheParams = DefaultInMemoryCacheConfig.cacheParamsMap ++ inMemCacheParamsOverrides
|
||||||
|
val inMemoryCacheConfig = new InMemoryCacheConfig(inMemCacheParams)
|
||||||
|
}
|
||||||
|
|
||||||
|
object DefaultClientConfig extends ClientConfig
|
@ -0,0 +1,53 @@
|
|||||||
|
package com.twitter.representation_manager.config
|
||||||
|
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
|
||||||
|
/*
|
||||||
|
* --------------------------------------------
|
||||||
|
* PLEASE NOTE:
|
||||||
|
* Having in-memory cache is not necessarily a free performance win, anyone considering it should
|
||||||
|
* investigate rather than blindly enabling it
|
||||||
|
* --------------------------------------------
|
||||||
|
* */
|
||||||
|
|
||||||
|
sealed trait InMemoryCacheParams
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This holds params that is required to set up a in-mem cache for a single embedding store
|
||||||
|
*/
|
||||||
|
case class EnabledInMemoryCacheParams(
|
||||||
|
ttl: Duration,
|
||||||
|
maxKeys: Int,
|
||||||
|
cacheName: String)
|
||||||
|
extends InMemoryCacheParams
|
||||||
|
object DisabledInMemoryCacheParams extends InMemoryCacheParams
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This is the class for the in-memory cache config. Client could pass in their own cacheParamsMap to
|
||||||
|
* create a new InMemoryCacheConfig instead of using the DefaultInMemoryCacheConfig object below
|
||||||
|
* */
|
||||||
|
class InMemoryCacheConfig(
|
||||||
|
cacheParamsMap: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
InMemoryCacheParams
|
||||||
|
] = Map.empty) {
|
||||||
|
|
||||||
|
def getCacheSetup(
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): InMemoryCacheParams = {
|
||||||
|
// When requested embedding type doesn't exist, we return DisabledInMemoryCacheParams
|
||||||
|
cacheParamsMap.getOrElse((embeddingType, modelVersion), DisabledInMemoryCacheParams)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Default config for the in-memory cache
|
||||||
|
* Clients can directly import and use this one if they don't want to set up a customised config
|
||||||
|
* */
|
||||||
|
object DefaultInMemoryCacheConfig extends InMemoryCacheConfig {
|
||||||
|
// set default to no in-memory caching
|
||||||
|
val cacheParamsMap = Map.empty
|
||||||
|
}
|
21
representation-manager/server/BUILD
Normal file
21
representation-manager/server/BUILD
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
jvm_binary(
|
||||||
|
name = "bin",
|
||||||
|
basename = "representation-manager",
|
||||||
|
main = "com.twitter.representation_manager.RepresentationManagerFedServerMain",
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-logback/src/main/scala",
|
||||||
|
"loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback",
|
||||||
|
"representation-manager/server/src/main/resources",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager",
|
||||||
|
"twitter-server/logback-classic/src/main/scala",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app
|
||||||
|
jvm_app(
|
||||||
|
name = "representation-manager-app",
|
||||||
|
archive = "zip",
|
||||||
|
binary = ":bin",
|
||||||
|
)
|
7
representation-manager/server/src/main/resources/BUILD
Normal file
7
representation-manager/server/src/main/resources/BUILD
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
resources(
|
||||||
|
sources = [
|
||||||
|
"*.xml",
|
||||||
|
"config/*.yml",
|
||||||
|
],
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
)
|
@ -0,0 +1,219 @@
|
|||||||
|
# ---------- traffic percentage by embedding type and model version ----------
|
||||||
|
# Decider strings are build dynamically following the rule in there
|
||||||
|
# i.e. s"enable_${embeddingType.name}_${modelVersion.name}"
|
||||||
|
# Hence this should be updated accordingly if usage is changed in the embedding stores
|
||||||
|
|
||||||
|
# Tweet embeddings
|
||||||
|
"enable_LogFavBasedTweet_Model20m145k2020":
|
||||||
|
comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavBasedTweet - Model20m145k2020. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedTweet_Model20m145kUpdated":
|
||||||
|
comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavBasedTweet - Model20m145kUpdated. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavLongestL2EmbeddingTweet_Model20m145k2020":
|
||||||
|
comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavLongestL2EmbeddingTweet - Model20m145k2020. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavLongestL2EmbeddingTweet_Model20m145kUpdated":
|
||||||
|
comment: "Enable x% read traffic (0<=x<=10000, e.g. 1000=10%) for LogFavLongestL2EmbeddingTweet - Model20m145kUpdated. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
# Topic embeddings
|
||||||
|
"enable_FavTfgTopic_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FavTfgTopic - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedKgoApeTopic_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedKgoApeTopic - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
# User embeddings - KnownFor
|
||||||
|
"enable_FavBasedProducer_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to FavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FavBasedProducer_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FollowBasedProducer_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FollowBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_AggregatableFavBasedProducer_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to AggregatableFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_AggregatableFavBasedProducer_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to AggregatableFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_AggregatableLogFavBasedProducer_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to AggregatableLogFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_AggregatableLogFavBasedProducer_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to AggregatableLogFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
enable_RelaxedAggregatableLogFavBasedProducer_Model20m145kUpdated:
|
||||||
|
comment: "Enable the read traffic to RelaxedAggregatableLogFavBasedProducer - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
enable_RelaxedAggregatableLogFavBasedProducer_Model20m145k2020:
|
||||||
|
comment: "Enable the read traffic to RelaxedAggregatableLogFavBasedProducer - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
# User embeddings - InterestedIn
|
||||||
|
"enable_LogFavBasedUserInterestedInFromAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedInFromAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FollowBasedUserInterestedInFromAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FollowBasedUserInterestedInFromAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FavBasedUserInterestedIn_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to FavBasedUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FavBasedUserInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FavBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FollowBasedUserInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FollowBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FavBasedUserInterestedInFromPE_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to FavBasedUserInterestedInFromPE - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FilteredUserInterestedIn_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to FilteredUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FilteredUserInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to FilteredUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_FilteredUserInterestedInFromPE_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to FilteredUserInterestedInFromPE - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_UnfilteredUserInterestedIn_Model20m145kUpdated":
|
||||||
|
comment: "Enable the read traffic to UnfilteredUserInterestedIn - Model20m145kUpdated from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_UnfilteredUserInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to UnfilteredUserInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_UserNextInterestedIn_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to UserNextInterestedIn - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedAverageAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
"enable_LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE_Model20m145k2020":
|
||||||
|
comment: "Enable the read traffic to LogFavBasedUserInterestedAverageAddressBookFromIIAPE - Model20m145k2020 from 0% to 100%. 0 means return EMPTY for all requests."
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
# ---------- load shedding by caller id ----------
|
||||||
|
# To create a new decider, add here with the same format and caller's details :
|
||||||
|
# "representation-manager_load_shed_by_caller_id_twtr:{{role}}:{{name}}:{{environment}}:{{cluster}}"
|
||||||
|
# All the deciders below are generated by this script:
|
||||||
|
# ./strato/bin/fed deciders representation-manager --service-role=representation-manager --service-name=representation-manager
|
||||||
|
# If you need to run the script and paste the output, add ONLY the prod deciders here.
|
||||||
|
"representation-manager_load_shed_by_caller_id_all":
|
||||||
|
comment: "Reject all traffic from caller id: all"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:cr-mixer:cr-mixer:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:cr-mixer:cr-mixer:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:cr-mixer:cr-mixer:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:cr-mixer:cr-mixer:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-1:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-1:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-1:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-1:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-3:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-3:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-3:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-3:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-4:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-4:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-4:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-4:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann-experimental:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:simclusters-ann:simclusters-ann:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:simclusters-ann:simclusters-ann:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoapi:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoapi:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-manager_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
# ---------- Dark Traffic Proxy ----------
|
||||||
|
representation-manager_forward_dark_traffic:
|
||||||
|
comment: "Defines the percentage of traffic to forward to diffy-proxy. Set to 0 to disable dark traffic forwarding"
|
||||||
|
default_availability: 0
|
165
representation-manager/server/src/main/resources/logback.xml
Normal file
165
representation-manager/server/src/main/resources/logback.xml
Normal file
@ -0,0 +1,165 @@
|
|||||||
|
<configuration>
|
||||||
|
<shutdownHook class="ch.qos.logback.core.hook.DelayingShutdownHook"/>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Service Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<property name="DEFAULT_SERVICE_PATTERN"
|
||||||
|
value="%-16X{traceId} %-12X{clientId:--} %-16X{method} %-25logger{0} %msg"/>
|
||||||
|
|
||||||
|
<property name="DEFAULT_ACCESS_PATTERN"
|
||||||
|
value="%msg"/>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Common Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<!-- JUL/JDK14 to Logback bridge -->
|
||||||
|
<contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator">
|
||||||
|
<resetJUL>true</resetJUL>
|
||||||
|
</contextListener>
|
||||||
|
|
||||||
|
<!-- ====================================================================================== -->
|
||||||
|
<!-- NOTE: The following appenders use a simple TimeBasedRollingPolicy configuration. -->
|
||||||
|
<!-- You may want to consider using a more advanced SizeAndTimeBasedRollingPolicy. -->
|
||||||
|
<!-- See: https://logback.qos.ch/manual/appenders.html#SizeAndTimeBasedRollingPolicy -->
|
||||||
|
<!-- ====================================================================================== -->
|
||||||
|
|
||||||
|
<!-- Service Log (rollover daily, keep maximum of 21 days of gzip compressed logs) -->
|
||||||
|
<appender name="SERVICE" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>${log.service.output}</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>${log.service.output}.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>3GB</totalSizeCap>
|
||||||
|
<!-- keep maximum 21 days' worth of history -->
|
||||||
|
<maxHistory>21</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- Access Log (rollover daily, keep maximum of 21 days of gzip compressed logs) -->
|
||||||
|
<appender name="ACCESS" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>${log.access.output}</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>${log.access.output}.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>100MB</totalSizeCap>
|
||||||
|
<!-- keep maximum 7 days' worth of history -->
|
||||||
|
<maxHistory>7</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>${DEFAULT_ACCESS_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!--LogLens -->
|
||||||
|
<appender name="LOGLENS" class="com.twitter.loglens.logback.LoglensAppender">
|
||||||
|
<mdcAdditionalContext>true</mdcAdditionalContext>
|
||||||
|
<category>${log.lens.category}</category>
|
||||||
|
<index>${log.lens.index}</index>
|
||||||
|
<tag>${log.lens.tag}/service</tag>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%msg</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- LogLens Access -->
|
||||||
|
<appender name="LOGLENS-ACCESS" class="com.twitter.loglens.logback.LoglensAppender">
|
||||||
|
<mdcAdditionalContext>true</mdcAdditionalContext>
|
||||||
|
<category>${log.lens.category}</category>
|
||||||
|
<index>${log.lens.index}</index>
|
||||||
|
<tag>${log.lens.tag}/access</tag>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%msg</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- Pipeline Execution Logs -->
|
||||||
|
<appender name="ALLOW-LISTED-PIPELINE-EXECUTIONS" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>allow_listed_pipeline_executions.log</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>allow_listed_pipeline_executions.log.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>100MB</totalSizeCap>
|
||||||
|
<!-- keep maximum 7 days' worth of history -->
|
||||||
|
<maxHistory>7</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Primary Async Appenders -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<property name="async_queue_size" value="${queue.size:-50000}"/>
|
||||||
|
<property name="async_max_flush_time" value="${max.flush.time:-0}"/>
|
||||||
|
|
||||||
|
<appender name="ASYNC-SERVICE" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="SERVICE"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-ACCESS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="ACCESS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-ALLOW-LISTED-PIPELINE-EXECUTIONS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="ALLOW-LISTED-PIPELINE-EXECUTIONS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-LOGLENS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="LOGLENS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-LOGLENS-ACCESS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="LOGLENS-ACCESS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Package Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<!-- Per-Package Config -->
|
||||||
|
<logger name="com.twitter" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.wilyns" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.configbus.client.file" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.finagle.mux" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.finagle.serverset2" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.logging.ScribeHandler" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.zookeeper.client.internal" level="INHERITED"/>
|
||||||
|
|
||||||
|
<!-- Root Config -->
|
||||||
|
<!-- For all logs except access logs, disable logging below log_level level by default. This can be overriden in the per-package loggers, and dynamically in the admin panel of individual instances. -->
|
||||||
|
<root level="${log_level:-INFO}">
|
||||||
|
<appender-ref ref="ASYNC-SERVICE"/>
|
||||||
|
<appender-ref ref="ASYNC-LOGLENS"/>
|
||||||
|
</root>
|
||||||
|
|
||||||
|
<!-- Access Logging -->
|
||||||
|
<!-- Access logs are turned off by default -->
|
||||||
|
<logger name="com.twitter.finatra.thrift.filters.AccessLoggingFilter" level="OFF" additivity="false">
|
||||||
|
<appender-ref ref="ASYNC-ACCESS"/>
|
||||||
|
<appender-ref ref="ASYNC-LOGLENS-ACCESS"/>
|
||||||
|
</logger>
|
||||||
|
|
||||||
|
</configuration>
|
@ -0,0 +1,13 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-thrift-client",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/topic",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/tweet",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns/user",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,40 @@
|
|||||||
|
package com.twitter.representation_manager
|
||||||
|
|
||||||
|
import com.google.inject.Module
|
||||||
|
import com.twitter.inject.thrift.modules.ThriftClientIdModule
|
||||||
|
import com.twitter.representation_manager.columns.topic.LocaleEntityIdSimClustersEmbeddingCol
|
||||||
|
import com.twitter.representation_manager.columns.topic.TopicIdSimClustersEmbeddingCol
|
||||||
|
import com.twitter.representation_manager.columns.tweet.TweetSimClustersEmbeddingCol
|
||||||
|
import com.twitter.representation_manager.columns.user.UserSimClustersEmbeddingCol
|
||||||
|
import com.twitter.representation_manager.modules.CacheModule
|
||||||
|
import com.twitter.representation_manager.modules.InterestsThriftClientModule
|
||||||
|
import com.twitter.representation_manager.modules.LegacyRMSConfigModule
|
||||||
|
import com.twitter.representation_manager.modules.StoreModule
|
||||||
|
import com.twitter.representation_manager.modules.TimerModule
|
||||||
|
import com.twitter.representation_manager.modules.UttClientModule
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.fed.server._
|
||||||
|
|
||||||
|
object RepresentationManagerFedServerMain extends RepresentationManagerFedServer
|
||||||
|
|
||||||
|
trait RepresentationManagerFedServer extends StratoFedServer {
|
||||||
|
override def dest: String = "/s/representation-manager/representation-manager"
|
||||||
|
override val modules: Seq[Module] =
|
||||||
|
Seq(
|
||||||
|
CacheModule,
|
||||||
|
InterestsThriftClientModule,
|
||||||
|
LegacyRMSConfigModule,
|
||||||
|
StoreModule,
|
||||||
|
ThriftClientIdModule,
|
||||||
|
TimerModule,
|
||||||
|
UttClientModule
|
||||||
|
)
|
||||||
|
|
||||||
|
override def columns: Seq[Class[_ <: StratoFed.Column]] =
|
||||||
|
Seq(
|
||||||
|
classOf[TweetSimClustersEmbeddingCol],
|
||||||
|
classOf[UserSimClustersEmbeddingCol],
|
||||||
|
classOf[TopicIdSimClustersEmbeddingCol],
|
||||||
|
classOf[LocaleEntityIdSimClustersEmbeddingCol]
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,9 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,26 @@
|
|||||||
|
package com.twitter.representation_manager.columns
|
||||||
|
|
||||||
|
import com.twitter.strato.access.Access.LdapGroup
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.FromColumns
|
||||||
|
import com.twitter.strato.config.Has
|
||||||
|
import com.twitter.strato.config.Prefix
|
||||||
|
import com.twitter.strato.config.ServiceIdentifierPattern
|
||||||
|
|
||||||
|
object ColumnConfigBase {
|
||||||
|
|
||||||
|
/****************** Internal permissions *******************/
|
||||||
|
val recosPermissions: Seq[com.twitter.strato.config.Policy] = Seq()
|
||||||
|
|
||||||
|
/****************** External permissions *******************/
|
||||||
|
// This is used to grant limited access to members outside of RP team.
|
||||||
|
val externalPermissions: Seq[com.twitter.strato.config.Policy] = Seq()
|
||||||
|
|
||||||
|
val contactInfo: ContactInfo = ContactInfo(
|
||||||
|
description = "Please contact Relevance Platform for more details",
|
||||||
|
contactEmail = "no-reply@twitter.com",
|
||||||
|
ldapGroup = "ldap",
|
||||||
|
jiraProject = "JIRA",
|
||||||
|
links = Seq("http://go/rms-runbook")
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,14 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/modules",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/store",
|
||||||
|
"representation-manager/server/src/main/thrift:thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,77 @@
|
|||||||
|
package com.twitter.representation_manager.columns.topic
|
||||||
|
|
||||||
|
import com.twitter.representation_manager.columns.ColumnConfigBase
|
||||||
|
import com.twitter.representation_manager.store.TopicSimClustersEmbeddingStore
|
||||||
|
import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.LocaleEntityId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.stitch.storehaus.StitchOfReadableStore
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.AnyOf
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.FromColumns
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.config.Prefix
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class LocaleEntityIdSimClustersEmbeddingCol @Inject() (
|
||||||
|
embeddingStore: TopicSimClustersEmbeddingStore)
|
||||||
|
extends StratoFed.Column(
|
||||||
|
"recommendations/representation_manager/simClustersEmbedding.LocaleEntityId")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] =
|
||||||
|
StitchOfReadableStore(embeddingStore.topicSimClustersEmbeddingStore.mapValues(_.toThrift))
|
||||||
|
|
||||||
|
val colPermissions: Seq[com.twitter.strato.config.Policy] =
|
||||||
|
ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns(
|
||||||
|
Set(
|
||||||
|
Prefix("ml/featureStore/simClusters"),
|
||||||
|
))
|
||||||
|
|
||||||
|
override val policy: Policy = AnyOf({
|
||||||
|
colPermissions
|
||||||
|
})
|
||||||
|
|
||||||
|
override type Key = LocaleEntityId
|
||||||
|
override type View = SimClustersEmbeddingView
|
||||||
|
override type Value = SimClustersEmbedding
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = ScroogeConv.fromStruct[LocaleEntityId]
|
||||||
|
override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView]
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText(
|
||||||
|
"The Topic SimClusters Embedding Endpoint in Representation Management Service with LocaleEntityId." +
|
||||||
|
" TDD: http://go/rms-tdd"))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
|
||||||
|
val embeddingId = SimClustersEmbeddingId(
|
||||||
|
view.embeddingType,
|
||||||
|
view.modelVersion,
|
||||||
|
InternalId.LocaleEntityId(key)
|
||||||
|
)
|
||||||
|
|
||||||
|
storeStitch(embeddingId)
|
||||||
|
.map(embedding => found(embedding))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,74 @@
|
|||||||
|
package com.twitter.representation_manager.columns.topic
|
||||||
|
|
||||||
|
import com.twitter.representation_manager.columns.ColumnConfigBase
|
||||||
|
import com.twitter.representation_manager.store.TopicSimClustersEmbeddingStore
|
||||||
|
import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.TopicId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.stitch.storehaus.StitchOfReadableStore
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.AnyOf
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.FromColumns
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.config.Prefix
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class TopicIdSimClustersEmbeddingCol @Inject() (embeddingStore: TopicSimClustersEmbeddingStore)
|
||||||
|
extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.TopicId")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] =
|
||||||
|
StitchOfReadableStore(embeddingStore.topicSimClustersEmbeddingStore.mapValues(_.toThrift))
|
||||||
|
|
||||||
|
val colPermissions: Seq[com.twitter.strato.config.Policy] =
|
||||||
|
ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns(
|
||||||
|
Set(
|
||||||
|
Prefix("ml/featureStore/simClusters"),
|
||||||
|
))
|
||||||
|
|
||||||
|
override val policy: Policy = AnyOf({
|
||||||
|
colPermissions
|
||||||
|
})
|
||||||
|
|
||||||
|
override type Key = TopicId
|
||||||
|
override type View = SimClustersEmbeddingView
|
||||||
|
override type Value = SimClustersEmbedding
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = ScroogeConv.fromStruct[TopicId]
|
||||||
|
override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView]
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(PlainText(
|
||||||
|
"The Topic SimClusters Embedding Endpoint in Representation Management Service with TopicId." +
|
||||||
|
" TDD: http://go/rms-tdd"))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
|
||||||
|
val embeddingId = SimClustersEmbeddingId(
|
||||||
|
view.embeddingType,
|
||||||
|
view.modelVersion,
|
||||||
|
InternalId.TopicId(key)
|
||||||
|
)
|
||||||
|
|
||||||
|
storeStitch(embeddingId)
|
||||||
|
.map(embedding => found(embedding))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,14 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/modules",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/store",
|
||||||
|
"representation-manager/server/src/main/thrift:thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,73 @@
|
|||||||
|
package com.twitter.representation_manager.columns.tweet
|
||||||
|
|
||||||
|
import com.twitter.representation_manager.columns.ColumnConfigBase
|
||||||
|
import com.twitter.representation_manager.store.TweetSimClustersEmbeddingStore
|
||||||
|
import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.stitch.storehaus.StitchOfReadableStore
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.AnyOf
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.FromColumns
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.config.Prefix
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class TweetSimClustersEmbeddingCol @Inject() (embeddingStore: TweetSimClustersEmbeddingStore)
|
||||||
|
extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.Tweet")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] =
|
||||||
|
StitchOfReadableStore(embeddingStore.tweetSimClustersEmbeddingStore.mapValues(_.toThrift))
|
||||||
|
|
||||||
|
val colPermissions: Seq[com.twitter.strato.config.Policy] =
|
||||||
|
ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns(
|
||||||
|
Set(
|
||||||
|
Prefix("ml/featureStore/simClusters"),
|
||||||
|
))
|
||||||
|
|
||||||
|
override val policy: Policy = AnyOf({
|
||||||
|
colPermissions
|
||||||
|
})
|
||||||
|
|
||||||
|
override type Key = Long // TweetId
|
||||||
|
override type View = SimClustersEmbeddingView
|
||||||
|
override type Value = SimClustersEmbedding
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = Conv.long
|
||||||
|
override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView]
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText("The Tweet SimClusters Embedding Endpoint in Representation Management Service." +
|
||||||
|
" TDD: http://go/rms-tdd"))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
|
||||||
|
val embeddingId = SimClustersEmbeddingId(
|
||||||
|
view.embeddingType,
|
||||||
|
view.modelVersion,
|
||||||
|
InternalId.TweetId(key)
|
||||||
|
)
|
||||||
|
|
||||||
|
storeStitch(embeddingId)
|
||||||
|
.map(embedding => found(embedding))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,14 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/columns",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/modules",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/store",
|
||||||
|
"representation-manager/server/src/main/thrift:thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,73 @@
|
|||||||
|
package com.twitter.representation_manager.columns.user
|
||||||
|
|
||||||
|
import com.twitter.representation_manager.columns.ColumnConfigBase
|
||||||
|
import com.twitter.representation_manager.store.UserSimClustersEmbeddingStore
|
||||||
|
import com.twitter.representation_manager.thriftscala.SimClustersEmbeddingView
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.stitch.storehaus.StitchOfReadableStore
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.AnyOf
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.FromColumns
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.config.Prefix
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class UserSimClustersEmbeddingCol @Inject() (embeddingStore: UserSimClustersEmbeddingStore)
|
||||||
|
extends StratoFed.Column("recommendations/representation_manager/simClustersEmbedding.User")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
private val storeStitch: SimClustersEmbeddingId => Stitch[SimClustersEmbedding] =
|
||||||
|
StitchOfReadableStore(embeddingStore.userSimClustersEmbeddingStore.mapValues(_.toThrift))
|
||||||
|
|
||||||
|
val colPermissions: Seq[com.twitter.strato.config.Policy] =
|
||||||
|
ColumnConfigBase.recosPermissions ++ ColumnConfigBase.externalPermissions :+ FromColumns(
|
||||||
|
Set(
|
||||||
|
Prefix("ml/featureStore/simClusters"),
|
||||||
|
))
|
||||||
|
|
||||||
|
override val policy: Policy = AnyOf({
|
||||||
|
colPermissions
|
||||||
|
})
|
||||||
|
|
||||||
|
override type Key = Long // UserId
|
||||||
|
override type View = SimClustersEmbeddingView
|
||||||
|
override type Value = SimClustersEmbedding
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = Conv.long
|
||||||
|
override val viewConv: Conv[View] = ScroogeConv.fromStruct[SimClustersEmbeddingView]
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[SimClustersEmbedding]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = ColumnConfigBase.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText("The User SimClusters Embedding Endpoint in Representation Management Service." +
|
||||||
|
" TDD: http://go/rms-tdd"))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
|
||||||
|
val embeddingId = SimClustersEmbeddingId(
|
||||||
|
view.embeddingType,
|
||||||
|
view.modelVersion,
|
||||||
|
InternalId.UserId(key)
|
||||||
|
)
|
||||||
|
|
||||||
|
storeStitch(embeddingId)
|
||||||
|
.map(embedding => found(embedding))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,13 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"decider/src/main/scala",
|
||||||
|
"finagle/finagle-memcached",
|
||||||
|
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
|
||||||
|
"relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/common",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,153 @@
|
|||||||
|
package com.twitter.representation_manager.common
|
||||||
|
|
||||||
|
import com.twitter.bijection.scrooge.BinaryScalaCodec
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.hashing.KeyHasher
|
||||||
|
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
|
||||||
|
import com.twitter.relevance_platform.common.injection.LZ4Injection
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbeddingIdCacheKeyBuilder
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
|
||||||
|
/*
|
||||||
|
* NOTE - ALL the cache configs here are just placeholders, NONE of them is used anyweher in RMS yet
|
||||||
|
* */
|
||||||
|
sealed trait MemCacheParams
|
||||||
|
sealed trait MemCacheConfig
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This holds params that is required to set up a memcache cache for a single embedding store
|
||||||
|
* */
|
||||||
|
case class EnabledMemCacheParams(ttl: Duration) extends MemCacheParams
|
||||||
|
object DisabledMemCacheParams extends MemCacheParams
|
||||||
|
|
||||||
|
/*
|
||||||
|
* We use this MemcacheConfig as the single source to set up the memcache for all RMS use cases
|
||||||
|
* NO OVERRIDE FROM CLIENT
|
||||||
|
* */
|
||||||
|
object MemCacheConfig {
|
||||||
|
val keyHasher: KeyHasher = KeyHasher.FNV1A_64
|
||||||
|
val hashKeyPrefix: String = "RMS"
|
||||||
|
val simclustersEmbeddingCacheKeyBuilder =
|
||||||
|
SimClustersEmbeddingIdCacheKeyBuilder(keyHasher.hashKey, hashKeyPrefix)
|
||||||
|
|
||||||
|
val cacheParamsMap: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
MemCacheParams
|
||||||
|
] = Map(
|
||||||
|
// Tweet Embeddings
|
||||||
|
(LogFavBasedTweet, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 10.minutes),
|
||||||
|
(LogFavBasedTweet, Model20m145k2020) -> EnabledMemCacheParams(ttl = 10.minutes),
|
||||||
|
(LogFavLongestL2EmbeddingTweet, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 10.minutes),
|
||||||
|
(LogFavLongestL2EmbeddingTweet, Model20m145k2020) -> EnabledMemCacheParams(ttl = 10.minutes),
|
||||||
|
// User - KnownFor Embeddings
|
||||||
|
(FavBasedProducer, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FollowBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(AggregatableLogFavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(RelaxedAggregatableLogFavBasedProducer, Model20m145kUpdated) -> EnabledMemCacheParams(ttl =
|
||||||
|
12.hours),
|
||||||
|
(RelaxedAggregatableLogFavBasedProducer, Model20m145k2020) -> EnabledMemCacheParams(ttl =
|
||||||
|
12.hours),
|
||||||
|
// User - InterestedIn Embeddings
|
||||||
|
(LogFavBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FollowBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FavBasedUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FavBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FollowBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(LogFavBasedUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FavBasedUserInterestedInFromPE, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FilteredUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FilteredUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(FilteredUserInterestedInFromPE, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(UnfilteredUserInterestedIn, Model20m145kUpdated) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(UnfilteredUserInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(UserNextInterestedIn, Model20m145k2020) -> EnabledMemCacheParams(ttl =
|
||||||
|
30.minutes), //embedding is updated every 2 hours, keeping it lower to avoid staleness
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedAverageAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
// Topic Embeddings
|
||||||
|
(FavTfgTopic, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
(LogFavBasedKgoApeTopic, Model20m145k2020) -> EnabledMemCacheParams(ttl = 12.hours),
|
||||||
|
)
|
||||||
|
|
||||||
|
def getCacheSetup(
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): MemCacheParams = {
|
||||||
|
// When requested (embeddingType, modelVersion) doesn't exist, we return DisabledMemCacheParams
|
||||||
|
cacheParamsMap.getOrElse((embeddingType, modelVersion), DisabledMemCacheParams)
|
||||||
|
}
|
||||||
|
|
||||||
|
def getCacheKeyPrefix(embeddingType: EmbeddingType, modelVersion: ModelVersion) =
|
||||||
|
s"${embeddingType.value}_${modelVersion.value}_"
|
||||||
|
|
||||||
|
def getStatsName(embeddingType: EmbeddingType, modelVersion: ModelVersion) =
|
||||||
|
s"${embeddingType.name}_${modelVersion.name}_mem_cache"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build a ReadableStore based on MemCacheConfig.
|
||||||
|
*
|
||||||
|
* If memcache is disabled, it will return a normal readable store wrapper of the rawStore,
|
||||||
|
* with SimClustersEmbedding as value;
|
||||||
|
* If memcache is enabled, it will return a ObservedMemcachedReadableStore wrapper of the rawStore,
|
||||||
|
* with memcache set up according to the EnabledMemCacheParams
|
||||||
|
* */
|
||||||
|
def buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
rawStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding],
|
||||||
|
cacheClient: Client,
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion,
|
||||||
|
stats: StatsReceiver
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val cacheParams = getCacheSetup(embeddingType, modelVersion)
|
||||||
|
val store = cacheParams match {
|
||||||
|
case DisabledMemCacheParams => rawStore
|
||||||
|
case EnabledMemCacheParams(ttl) =>
|
||||||
|
val memCacheKeyPrefix = MemCacheConfig.getCacheKeyPrefix(
|
||||||
|
embeddingType,
|
||||||
|
modelVersion
|
||||||
|
)
|
||||||
|
val statsName = MemCacheConfig.getStatsName(
|
||||||
|
embeddingType,
|
||||||
|
modelVersion
|
||||||
|
)
|
||||||
|
ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = rawStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = ttl
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = stats.scope(statsName),
|
||||||
|
keyToString = { k => memCacheKeyPrefix + k.toString }
|
||||||
|
)
|
||||||
|
}
|
||||||
|
store.mapValues(SimClustersEmbedding(_))
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,25 @@
|
|||||||
|
package com.twitter.representation_manager.common
|
||||||
|
|
||||||
|
import com.twitter.decider.Decider
|
||||||
|
import com.twitter.decider.RandomRecipient
|
||||||
|
import com.twitter.decider.Recipient
|
||||||
|
import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
case class RepresentationManagerDecider @Inject() (decider: Decider) {
|
||||||
|
|
||||||
|
val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider)
|
||||||
|
|
||||||
|
def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = {
|
||||||
|
decider.isAvailable(feature, recipient)
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* When useRandomRecipient is set to false, the decider is either completely on or off.
|
||||||
|
* When useRandomRecipient is set to true, the decider is on for the specified % of traffic.
|
||||||
|
*/
|
||||||
|
def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = {
|
||||||
|
if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient))
|
||||||
|
else isAvailable(feature, None)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,25 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"content-recommender/server/src/main/scala/com/twitter/contentrecommender:representation-manager-deps",
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/store/strato",
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util",
|
||||||
|
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
|
||||||
|
"relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection",
|
||||||
|
"relevance-platform/src/main/scala/com/twitter/relevance_platform/common/readablestore",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/common",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/store",
|
||||||
|
"src/scala/com/twitter/ml/api/embedding",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/common",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/score",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/summingbird/stores",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/manhattan",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/util",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
"src/thrift/com/twitter/socialgraph:thrift-scala",
|
||||||
|
"storage/clients/manhattan/client/src/main/scala",
|
||||||
|
"tweetypie/src/scala/com/twitter/tweetypie/util",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,846 @@
|
|||||||
|
package com.twitter.representation_manager.migration
|
||||||
|
|
||||||
|
import com.twitter.bijection.Injection
|
||||||
|
import com.twitter.bijection.scrooge.BinaryScalaCodec
|
||||||
|
import com.twitter.contentrecommender.store.ApeEntityEmbeddingStore
|
||||||
|
import com.twitter.contentrecommender.store.InterestsOptOutStore
|
||||||
|
import com.twitter.contentrecommender.store.SemanticCoreTopicSeedStore
|
||||||
|
import com.twitter.contentrecommender.twistly
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.decider.Decider
|
||||||
|
import com.twitter.escherbird.util.uttclient.CacheConfigV2
|
||||||
|
import com.twitter.escherbird.util.uttclient.CachedUttClientV2
|
||||||
|
import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2
|
||||||
|
import com.twitter.escherbird.utt.strato.thriftscala.Environment
|
||||||
|
import com.twitter.finagle.ThriftMux
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax
|
||||||
|
import com.twitter.finagle.mux.ClientDiscardedRequestException
|
||||||
|
import com.twitter.finagle.service.ReqRep
|
||||||
|
import com.twitter.finagle.service.ResponseClass
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.finagle.thrift.ClientId
|
||||||
|
import com.twitter.frigate.common.store.strato.StratoFetchableStore
|
||||||
|
import com.twitter.frigate.common.util.SeqLongInjection
|
||||||
|
import com.twitter.hashing.KeyHasher
|
||||||
|
import com.twitter.hermit.store.common.DeciderableReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedCachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.interests.thriftscala.InterestsThriftService
|
||||||
|
import com.twitter.relevance_platform.common.injection.LZ4Injection
|
||||||
|
import com.twitter.relevance_platform.common.readablestore.ReadableStoreWithTimeout
|
||||||
|
import com.twitter.representation_manager.common.RepresentationManagerDecider
|
||||||
|
import com.twitter.representation_manager.store.DeciderConstants
|
||||||
|
import com.twitter.representation_manager.store.DeciderKey
|
||||||
|
import com.twitter.simclusters_v2.common.ModelVersions
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbeddingIdCacheKeyBuilder
|
||||||
|
import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.PersistentTweetEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.ProducerClusterEmbeddingReadableStores
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ClustersUserIsInterestedIn
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145k2020
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145kUpdated
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersMultiEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersMultiEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.storehaus_internal.manhattan.Athena
|
||||||
|
import com.twitter.storehaus_internal.manhattan.ManhattanRO
|
||||||
|
import com.twitter.storehaus_internal.manhattan.ManhattanROConfig
|
||||||
|
import com.twitter.storehaus_internal.util.ApplicationID
|
||||||
|
import com.twitter.storehaus_internal.util.DatasetName
|
||||||
|
import com.twitter.storehaus_internal.util.HDFSPath
|
||||||
|
import com.twitter.strato.client.Strato
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import com.twitter.strato.thrift.ScroogeConvImplicits._
|
||||||
|
import com.twitter.tweetypie.util.UserId
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
import com.twitter.util.Future
|
||||||
|
import com.twitter.util.Throw
|
||||||
|
import com.twitter.util.Timer
|
||||||
|
import javax.inject.Inject
|
||||||
|
import javax.inject.Named
|
||||||
|
import scala.reflect.ClassTag
|
||||||
|
|
||||||
|
class LegacyRMS @Inject() (
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
cacheClient: Client,
|
||||||
|
stats: StatsReceiver,
|
||||||
|
decider: Decider,
|
||||||
|
clientId: ClientId,
|
||||||
|
timer: Timer,
|
||||||
|
@Named("cacheHashKeyPrefix") val cacheHashKeyPrefix: String = "RMS",
|
||||||
|
@Named("useContentRecommenderConfiguration") val useContentRecommenderConfiguration: Boolean =
|
||||||
|
false) {
|
||||||
|
|
||||||
|
private val mhMtlsParams: ManhattanKVClientMtlsParams = ManhattanKVClientMtlsParams(
|
||||||
|
serviceIdentifier)
|
||||||
|
private val rmsDecider = RepresentationManagerDecider(decider)
|
||||||
|
val keyHasher: KeyHasher = KeyHasher.FNV1A_64
|
||||||
|
|
||||||
|
private val embeddingCacheKeyBuilder =
|
||||||
|
SimClustersEmbeddingIdCacheKeyBuilder(keyHasher.hashKey, cacheHashKeyPrefix)
|
||||||
|
private val statsReceiver = stats.scope("representation_management")
|
||||||
|
|
||||||
|
// Strato client, default timeout = 280ms
|
||||||
|
val stratoClient: StratoClient =
|
||||||
|
Strato.client
|
||||||
|
.withMutualTls(serviceIdentifier)
|
||||||
|
.build()
|
||||||
|
|
||||||
|
// Builds ThriftMux client builder for Content-Recommender service
|
||||||
|
private def makeThriftClientBuilder(
|
||||||
|
requestTimeout: Duration
|
||||||
|
): ThriftMux.Client = {
|
||||||
|
ThriftMux.client
|
||||||
|
.withClientId(clientId)
|
||||||
|
.withMutualTls(serviceIdentifier)
|
||||||
|
.withRequestTimeout(requestTimeout)
|
||||||
|
.withStatsReceiver(statsReceiver.scope("clnt"))
|
||||||
|
.withResponseClassifier {
|
||||||
|
case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private def makeThriftClient[ThriftServiceType: ClassTag](
|
||||||
|
dest: String,
|
||||||
|
label: String,
|
||||||
|
requestTimeout: Duration = 450.milliseconds
|
||||||
|
): ThriftServiceType = {
|
||||||
|
makeThriftClientBuilder(requestTimeout)
|
||||||
|
.build[ThriftServiceType](dest, label)
|
||||||
|
}
|
||||||
|
|
||||||
|
/** *** SimCluster Embedding Stores ******/
|
||||||
|
implicit val simClustersEmbeddingIdInjection: Injection[SimClustersEmbeddingId, Array[Byte]] =
|
||||||
|
BinaryScalaCodec(SimClustersEmbeddingId)
|
||||||
|
implicit val simClustersEmbeddingInjection: Injection[ThriftSimClustersEmbedding, Array[Byte]] =
|
||||||
|
BinaryScalaCodec(ThriftSimClustersEmbedding)
|
||||||
|
implicit val simClustersMultiEmbeddingInjection: Injection[SimClustersMultiEmbedding, Array[
|
||||||
|
Byte
|
||||||
|
]] =
|
||||||
|
BinaryScalaCodec(SimClustersMultiEmbedding)
|
||||||
|
implicit val simClustersMultiEmbeddingIdInjection: Injection[SimClustersMultiEmbeddingId, Array[
|
||||||
|
Byte
|
||||||
|
]] =
|
||||||
|
BinaryScalaCodec(SimClustersMultiEmbeddingId)
|
||||||
|
|
||||||
|
def getEmbeddingsDataset(
|
||||||
|
mhMtlsParams: ManhattanKVClientMtlsParams,
|
||||||
|
datasetName: String
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding] = {
|
||||||
|
ManhattanRO.getReadableStoreWithMtls[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
ManhattanROConfig(
|
||||||
|
HDFSPath(""), // not needed
|
||||||
|
ApplicationID("content_recommender_athena"),
|
||||||
|
DatasetName(datasetName), // this should be correct
|
||||||
|
Athena
|
||||||
|
),
|
||||||
|
mhMtlsParams
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBasedLongestL2Tweet20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.longestL2NormTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset,
|
||||||
|
statsReceiver,
|
||||||
|
maxLength = 10,
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = rawStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 15.minutes
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver =
|
||||||
|
statsReceiver.scope("log_fav_based_longest_l2_tweet_embedding_20m145k2020_mem_cache"),
|
||||||
|
keyToString = { k =>
|
||||||
|
s"scez_l2:${LogFavBasedTweet}_${ModelVersions.Model20M145K2020}_$k"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
memcachedStore
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
LogFavLongestL2EmbeddingTweet,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.TweetId(tweetId)) =>
|
||||||
|
tweetId
|
||||||
|
}
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
inMemoryCacheStore,
|
||||||
|
ttl = 12.minute,
|
||||||
|
maxKeys = 1048575,
|
||||||
|
cacheName = "log_fav_based_longest_l2_tweet_embedding_20m145k2020_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_longest_l2_tweet_embedding_20m145k2020_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBased20M145KUpdatedTweetEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.mostRecentTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset,
|
||||||
|
statsReceiver
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = rawStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 10.minutes
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_tweet_embedding_mem_cache"),
|
||||||
|
keyToString = { k =>
|
||||||
|
// SimClusters_embedding_LZ4/embeddingType_modelVersion_tweetId
|
||||||
|
s"scez:${LogFavBasedTweet}_${ModelVersions.Model20M145KUpdated}_$k"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
memcachedStore
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
LogFavBasedTweet,
|
||||||
|
Model20m145kUpdated,
|
||||||
|
InternalId.TweetId(tweetId)) =>
|
||||||
|
tweetId
|
||||||
|
}
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
}
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
inMemoryCacheStore,
|
||||||
|
ttl = 5.minute,
|
||||||
|
maxKeys = 1048575, // 200MB
|
||||||
|
cacheName = "log_fav_based_tweet_embedding_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_tweet_embedding_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBased20M145K2020TweetEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.mostRecentTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset,
|
||||||
|
statsReceiver,
|
||||||
|
maxLength = 10,
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = rawStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 15.minutes
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_tweet_embedding_20m145k2020_mem_cache"),
|
||||||
|
keyToString = { k =>
|
||||||
|
// SimClusters_embedding_LZ4/embeddingType_modelVersion_tweetId
|
||||||
|
s"scez:${LogFavBasedTweet}_${ModelVersions.Model20M145K2020}_$k"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
val inMemoryCacheStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
memcachedStore
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
LogFavBasedTweet,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.TweetId(tweetId)) =>
|
||||||
|
tweetId
|
||||||
|
}
|
||||||
|
.mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
inMemoryCacheStore,
|
||||||
|
ttl = 12.minute,
|
||||||
|
maxKeys = 16777215,
|
||||||
|
cacheName = "log_fav_based_tweet_embedding_20m145k2020_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_tweet_embedding_20m145k2020_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val favBasedTfgTopicEmbedding2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val stratoStore =
|
||||||
|
StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/favBasedTFGTopic20M145K2020")
|
||||||
|
|
||||||
|
val truncatedStore = stratoStore.mapValues { embedding =>
|
||||||
|
SimClustersEmbedding(embedding, truncate = 50)
|
||||||
|
}
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from(
|
||||||
|
ObservedReadableStore(truncatedStore)(
|
||||||
|
statsReceiver.scope("fav_tfg_topic_embedding_2020_cache_backing_store")),
|
||||||
|
ttl = 12.hours,
|
||||||
|
maxKeys = 262143, // 200MB
|
||||||
|
cacheName = "fav_tfg_topic_embedding_2020_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("fav_tfg_topic_embedding_2020_cache"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBasedApe20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
ObservedReadableStore(
|
||||||
|
StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020")
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
AggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
internalId) =>
|
||||||
|
SimClustersEmbeddingId(AggregatableLogFavBasedProducer, Model20m145k2020, internalId)
|
||||||
|
}
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding, 50))
|
||||||
|
)(statsReceiver.scope("aggregatable_producer_embeddings_by_logfav_score_2020"))
|
||||||
|
}
|
||||||
|
|
||||||
|
val interestService: InterestsThriftService.MethodPerEndpoint =
|
||||||
|
makeThriftClient[InterestsThriftService.MethodPerEndpoint](
|
||||||
|
"/s/interests-thrift-service/interests-thrift-service",
|
||||||
|
"interests_thrift_service"
|
||||||
|
)
|
||||||
|
|
||||||
|
val interestsOptOutStore: InterestsOptOutStore = InterestsOptOutStore(interestService)
|
||||||
|
|
||||||
|
// Save 2 ^ 18 UTTs. Promising 100% cache rate
|
||||||
|
lazy val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143)
|
||||||
|
lazy val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2(
|
||||||
|
getTaxonomyConfig = defaultCacheConfigV2,
|
||||||
|
getUttTaxonomyConfig = defaultCacheConfigV2,
|
||||||
|
getLeafIds = defaultCacheConfigV2,
|
||||||
|
getLeafUttEntities = defaultCacheConfigV2
|
||||||
|
)
|
||||||
|
|
||||||
|
// CachedUttClient to use StratoClient
|
||||||
|
lazy val cachedUttClientV2: CachedUttClientV2 = new CachedUttClientV2(
|
||||||
|
stratoClient = stratoClient,
|
||||||
|
env = Environment.Prod,
|
||||||
|
cacheConfigs = uttClientCacheConfigsV2,
|
||||||
|
statsReceiver = statsReceiver.scope("cached_utt_client")
|
||||||
|
)
|
||||||
|
|
||||||
|
lazy val semanticCoreTopicSeedStore: ReadableStore[
|
||||||
|
SemanticCoreTopicSeedStore.Key,
|
||||||
|
Seq[UserId]
|
||||||
|
] = {
|
||||||
|
/*
|
||||||
|
Up to 1000 Long seeds per topic/language = 62.5kb per topic/language (worst case)
|
||||||
|
Assume ~10k active topic/languages ~= 650MB (worst case)
|
||||||
|
*/
|
||||||
|
val underlying = new SemanticCoreTopicSeedStore(cachedUttClientV2, interestsOptOutStore)(
|
||||||
|
statsReceiver.scope("semantic_core_topic_seed_store"))
|
||||||
|
|
||||||
|
val memcacheStore = ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = underlying,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = SeqLongInjection,
|
||||||
|
statsReceiver = statsReceiver.scope("topic_producer_seed_store_mem_cache"),
|
||||||
|
keyToString = { k => s"tpss:${k.entityId}_${k.languageCode}" }
|
||||||
|
)
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SemanticCoreTopicSeedStore.Key, Seq[UserId]](
|
||||||
|
store = memcacheStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 20e3.toInt,
|
||||||
|
cacheName = "topic_producer_seed_store_cache",
|
||||||
|
windowSize = 5000
|
||||||
|
)(statsReceiver.scope("topic_producer_seed_store_cache"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBasedApeEntity20M145K2020EmbeddingStore: ApeEntityEmbeddingStore = {
|
||||||
|
val apeStore = logFavBasedApe20M145K2020EmbeddingStore.composeKeyMapping[UserId]({ id =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
AggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.UserId(id))
|
||||||
|
})
|
||||||
|
|
||||||
|
new ApeEntityEmbeddingStore(
|
||||||
|
semanticCoreSeedStore = semanticCoreTopicSeedStore,
|
||||||
|
aggregatableProducerEmbeddingStore = apeStore,
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val logFavBasedApeEntity20M145K2020EmbeddingCachedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val truncatedStore =
|
||||||
|
logFavBasedApeEntity20M145K2020EmbeddingStore.mapValues(_.truncate(50).toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = truncatedStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
val inMemoryCachedStore =
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "log_fav_based_ape_entity_2020_embedding_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_cached_store"))
|
||||||
|
|
||||||
|
DeciderableReadableStore(
|
||||||
|
inMemoryCachedStore,
|
||||||
|
rmsDecider.deciderGateBuilder.idGateWithHashing[SimClustersEmbeddingId](
|
||||||
|
DeciderKey.enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore),
|
||||||
|
statsReceiver.scope("log_fav_based_ape_entity_2020_embedding_deciderable_store")
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val relaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
ObservedReadableStore(
|
||||||
|
StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/logFavBasedAPERelaxedFavEngagementThreshold20M145K2020")
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
internalId) =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
internalId)
|
||||||
|
}
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding).truncate(50))
|
||||||
|
)(statsReceiver.scope(
|
||||||
|
"aggregatable_producer_embeddings_by_logfav_score_relaxed_fav_engagement_threshold_2020"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val relaxedLogFavBasedApe20M145K2020EmbeddingCachedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val truncatedStore =
|
||||||
|
relaxedLogFavBasedApe20M145K2020EmbeddingStore.mapValues(_.truncate(50).toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = truncatedStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver =
|
||||||
|
statsReceiver.scope("relaxed_log_fav_based_ape_entity_2020_embedding_mem_cache"),
|
||||||
|
keyToString = { k: SimClustersEmbeddingId => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "relaxed_log_fav_based_ape_entity_2020_embedding_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("relaxed_log_fav_based_ape_entity_2020_embedding_cache_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val favBasedProducer20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore = ProducerClusterEmbeddingReadableStores
|
||||||
|
.getProducerTopKSimClusters2020EmbeddingsStore(
|
||||||
|
mhMtlsParams
|
||||||
|
).composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
FavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.UserId(userId)) =>
|
||||||
|
userId
|
||||||
|
}.mapValues { topSimClustersWithScore =>
|
||||||
|
ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters.take(10))
|
||||||
|
}
|
||||||
|
|
||||||
|
// same memcache config as for favBasedUserInterestedIn20M145K2020Store
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 24.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("fav_based_producer_embedding_20M_145K_2020_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 12.hours,
|
||||||
|
maxKeys = 16777215,
|
||||||
|
cacheName = "fav_based_producer_embedding_20M_145K_2020_embedding_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("fav_based_producer_embedding_20M_145K_2020_embedding_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val interestedIn20M145KUpdatedStore: ReadableStore[UserId, ClustersUserIsInterestedIn] = {
|
||||||
|
UserInterestedInReadableStore.defaultStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145KUpdated
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val interestedIn20M145K2020Store: ReadableStore[UserId, ClustersUserIsInterestedIn] = {
|
||||||
|
UserInterestedInReadableStore.defaultStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145K2020
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val InterestedInFromPE20M145KUpdatedStore: ReadableStore[
|
||||||
|
UserId,
|
||||||
|
ClustersUserIsInterestedIn
|
||||||
|
] = {
|
||||||
|
UserInterestedInReadableStore.defaultIIPEStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145KUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val simClustersInterestedInStore: ReadableStore[
|
||||||
|
(UserId, ModelVersion),
|
||||||
|
ClustersUserIsInterestedIn
|
||||||
|
] = {
|
||||||
|
new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] {
|
||||||
|
override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = {
|
||||||
|
k match {
|
||||||
|
case (userId, Model20m145kUpdated) =>
|
||||||
|
interestedIn20M145KUpdatedStore.get(userId)
|
||||||
|
case (userId, Model20m145k2020) =>
|
||||||
|
interestedIn20M145K2020Store.get(userId)
|
||||||
|
case _ =>
|
||||||
|
Future.None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val simClustersInterestedInFromProducerEmbeddingsStore: ReadableStore[
|
||||||
|
(UserId, ModelVersion),
|
||||||
|
ClustersUserIsInterestedIn
|
||||||
|
] = {
|
||||||
|
new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] {
|
||||||
|
override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = {
|
||||||
|
k match {
|
||||||
|
case (userId, ModelVersion.Model20m145kUpdated) =>
|
||||||
|
InterestedInFromPE20M145KUpdatedStore.get(userId)
|
||||||
|
case _ =>
|
||||||
|
Future.None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
lazy val userInterestedInStore =
|
||||||
|
new twistly.interestedin.EmbeddingStore(
|
||||||
|
interestedInStore = simClustersInterestedInStore,
|
||||||
|
interestedInFromProducerEmbeddingStore = simClustersInterestedInFromProducerEmbeddingsStore,
|
||||||
|
statsReceiver = statsReceiver
|
||||||
|
)
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val favBasedUserInterestedIn20M145KUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultSimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.FavBasedUserInterestedIn,
|
||||||
|
ModelVersion.Model20m145kUpdated)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("fav_based_user_interested_in_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "fav_based_user_interested_in_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("fav_based_user_interested_in_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val LogFavBasedInterestedInFromAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultIIAPESimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.LogFavBasedUserInterestedInFromAPE,
|
||||||
|
ModelVersion.Model20m145k2020)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_user_interested_in_from_ape_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "log_fav_based_user_interested_in_from_ape_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_user_interested_in_from_ape_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val FollowBasedInterestedInFromAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultIIAPESimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.FollowBasedUserInterestedInFromAPE,
|
||||||
|
ModelVersion.Model20m145k2020)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("follow_based_user_interested_in_from_ape_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "follow_based_user_interested_in_from_ape_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("follow_based_user_interested_in_from_ape_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
// production
|
||||||
|
lazy val favBasedUserInterestedIn20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding] =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultSimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.FavBasedUserInterestedIn,
|
||||||
|
ModelVersion.Model20m145k2020).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("fav_based_user_interested_in_2020_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val logFavBasedUserInterestedIn20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultSimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.LogFavBasedUserInterestedIn,
|
||||||
|
ModelVersion.Model20m145k2020)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore.mapValues(_.toThrift),
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("log_fav_based_user_interested_in_2020_store"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "log_fav_based_user_interested_in_2020_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("log_fav_based_user_interested_in_2020_store"))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Production
|
||||||
|
lazy val favBasedUserInterestedInFromPE20M145KUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val underlyingStore =
|
||||||
|
UserInterestedInReadableStore
|
||||||
|
.defaultIIPESimClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
EmbeddingType.FavBasedUserInterestedInFromPE,
|
||||||
|
ModelVersion.Model20m145kUpdated)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(ThriftSimClustersEmbedding)),
|
||||||
|
statsReceiver = statsReceiver.scope("fav_based_user_interested_in_from_pe_mem_cache"),
|
||||||
|
keyToString = { k => embeddingCacheKeyBuilder.apply(k) }
|
||||||
|
).mapValues(SimClustersEmbedding(_))
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SimClustersEmbeddingId, SimClustersEmbedding](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = "fav_based_user_interested_in_from_pe_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("fav_based_user_interested_in_from_pe_cache"))
|
||||||
|
}
|
||||||
|
|
||||||
|
private val underlyingStores: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding]
|
||||||
|
] = Map(
|
||||||
|
// Tweet Embeddings
|
||||||
|
(LogFavBasedTweet, Model20m145kUpdated) -> logFavBased20M145KUpdatedTweetEmbeddingStore,
|
||||||
|
(LogFavBasedTweet, Model20m145k2020) -> logFavBased20M145K2020TweetEmbeddingStore,
|
||||||
|
(
|
||||||
|
LogFavLongestL2EmbeddingTweet,
|
||||||
|
Model20m145k2020) -> logFavBasedLongestL2Tweet20M145K2020EmbeddingStore,
|
||||||
|
// Entity Embeddings
|
||||||
|
(FavTfgTopic, Model20m145k2020) -> favBasedTfgTopicEmbedding2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedKgoApeTopic,
|
||||||
|
Model20m145k2020) -> logFavBasedApeEntity20M145K2020EmbeddingCachedStore,
|
||||||
|
// KnownFor Embeddings
|
||||||
|
(FavBasedProducer, Model20m145k2020) -> favBasedProducer20M145K2020EmbeddingStore,
|
||||||
|
(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020) -> relaxedLogFavBasedApe20M145K2020EmbeddingCachedStore,
|
||||||
|
// InterestedIn Embeddings
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020) -> LogFavBasedInterestedInFromAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
FollowBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020) -> FollowBasedInterestedInFromAPE20M145K2020Store,
|
||||||
|
(FavBasedUserInterestedIn, Model20m145kUpdated) -> favBasedUserInterestedIn20M145KUpdatedStore,
|
||||||
|
(FavBasedUserInterestedIn, Model20m145k2020) -> favBasedUserInterestedIn20M145K2020Store,
|
||||||
|
(LogFavBasedUserInterestedIn, Model20m145k2020) -> logFavBasedUserInterestedIn20M145K2020Store,
|
||||||
|
(
|
||||||
|
FavBasedUserInterestedInFromPE,
|
||||||
|
Model20m145kUpdated) -> favBasedUserInterestedInFromPE20M145KUpdatedStore,
|
||||||
|
(FilteredUserInterestedIn, Model20m145kUpdated) -> userInterestedInStore,
|
||||||
|
(FilteredUserInterestedIn, Model20m145k2020) -> userInterestedInStore,
|
||||||
|
(FilteredUserInterestedInFromPE, Model20m145kUpdated) -> userInterestedInStore,
|
||||||
|
(UnfilteredUserInterestedIn, Model20m145kUpdated) -> userInterestedInStore,
|
||||||
|
(UnfilteredUserInterestedIn, Model20m145k2020) -> userInterestedInStore,
|
||||||
|
)
|
||||||
|
|
||||||
|
val simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val underlying: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
SimClustersEmbeddingStore.buildWithDecider(
|
||||||
|
underlyingStores = underlyingStores,
|
||||||
|
decider = rmsDecider.decider,
|
||||||
|
statsReceiver = statsReceiver.scope("simClusters_embeddings_store_deciderable")
|
||||||
|
)
|
||||||
|
|
||||||
|
val underlyingWithTimeout: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
new ReadableStoreWithTimeout(
|
||||||
|
rs = underlying,
|
||||||
|
decider = rmsDecider.decider,
|
||||||
|
enableTimeoutDeciderKey = DeciderConstants.enableSimClustersEmbeddingStoreTimeouts,
|
||||||
|
timeoutValueKey = DeciderConstants.simClustersEmbeddingStoreTimeoutValueMillis,
|
||||||
|
timer = timer,
|
||||||
|
statsReceiver = statsReceiver.scope("simClusters_embedding_store_timeouts")
|
||||||
|
)
|
||||||
|
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = underlyingWithTimeout
|
||||||
|
)(statsReceiver.scope("simClusters_embeddings_store"))
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,18 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication",
|
||||||
|
"finagle/finagle-stats",
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util",
|
||||||
|
"interests-service/thrift/src/main/thrift:thrift-scala",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/common",
|
||||||
|
"servo/util",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/manhattan",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/memcache",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/util",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/client",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,34 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import javax.inject.Singleton
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.storehaus_internal.memcache.MemcacheStore
|
||||||
|
import com.twitter.storehaus_internal.util.ClientName
|
||||||
|
import com.twitter.storehaus_internal.util.ZkEndPoint
|
||||||
|
|
||||||
|
object CacheModule extends TwitterModule {
|
||||||
|
|
||||||
|
private val cacheDest = flag[String]("cache_module.dest", "Path to memcache service")
|
||||||
|
private val timeout = flag[Int]("memcache.timeout", "Memcache client timeout")
|
||||||
|
private val retries = flag[Int]("memcache.retries", "Memcache timeout retries")
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesCache(
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
stats: StatsReceiver
|
||||||
|
): Client =
|
||||||
|
MemcacheStore.memcachedClient(
|
||||||
|
name = ClientName("memcache_representation_manager"),
|
||||||
|
dest = ZkEndPoint(cacheDest()),
|
||||||
|
timeout = timeout().milliseconds,
|
||||||
|
retries = retries(),
|
||||||
|
statsReceiver = stats.scope("cache_client"),
|
||||||
|
serviceIdentifier = serviceIdentifier
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,40 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.finagle.ThriftMux
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.mtls.client.MtlsStackClient.MtlsThriftMuxClientSyntax
|
||||||
|
import com.twitter.finagle.mux.ClientDiscardedRequestException
|
||||||
|
import com.twitter.finagle.service.ReqRep
|
||||||
|
import com.twitter.finagle.service.ResponseClass
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.finagle.thrift.ClientId
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.interests.thriftscala.InterestsThriftService
|
||||||
|
import com.twitter.util.Throw
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object InterestsThriftClientModule extends TwitterModule {
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesInterestsThriftClient(
|
||||||
|
clientId: ClientId,
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
statsReceiver: StatsReceiver
|
||||||
|
): InterestsThriftService.MethodPerEndpoint = {
|
||||||
|
ThriftMux.client
|
||||||
|
.withClientId(clientId)
|
||||||
|
.withMutualTls(serviceIdentifier)
|
||||||
|
.withRequestTimeout(450.milliseconds)
|
||||||
|
.withStatsReceiver(statsReceiver.scope("InterestsThriftClient"))
|
||||||
|
.withResponseClassifier {
|
||||||
|
case ReqRep(_, Throw(_: ClientDiscardedRequestException)) => ResponseClass.Ignorable
|
||||||
|
}
|
||||||
|
.build[InterestsThriftService.MethodPerEndpoint](
|
||||||
|
dest = "/s/interests-thrift-service/interests-thrift-service",
|
||||||
|
label = "interests_thrift_service"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,18 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import javax.inject.Named
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object LegacyRMSConfigModule extends TwitterModule {
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
@Named("cacheHashKeyPrefix")
|
||||||
|
def providesCacheHashKeyPrefix: String = "RMS"
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
@Named("useContentRecommenderConfiguration")
|
||||||
|
def providesUseContentRecommenderConfiguration: Boolean = false
|
||||||
|
}
|
@ -0,0 +1,24 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import javax.inject.Singleton
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.decider.Decider
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.representation_manager.common.RepresentationManagerDecider
|
||||||
|
import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams
|
||||||
|
|
||||||
|
object StoreModule extends TwitterModule {
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesMhMtlsParams(
|
||||||
|
serviceIdentifier: ServiceIdentifier
|
||||||
|
): ManhattanKVClientMtlsParams = ManhattanKVClientMtlsParams(serviceIdentifier)
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesRmsDecider(
|
||||||
|
decider: Decider
|
||||||
|
): RepresentationManagerDecider = RepresentationManagerDecider(decider)
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,13 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.finagle.util.DefaultTimer
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.util.Timer
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object TimerModule extends TwitterModule {
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesTimer: Timer = DefaultTimer
|
||||||
|
}
|
@ -0,0 +1,39 @@
|
|||||||
|
package com.twitter.representation_manager.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.escherbird.util.uttclient.CacheConfigV2
|
||||||
|
import com.twitter.escherbird.util.uttclient.CachedUttClientV2
|
||||||
|
import com.twitter.escherbird.util.uttclient.UttClientCacheConfigsV2
|
||||||
|
import com.twitter.escherbird.utt.strato.thriftscala.Environment
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object UttClientModule extends TwitterModule {
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesUttClient(
|
||||||
|
stratoClient: StratoClient,
|
||||||
|
statsReceiver: StatsReceiver
|
||||||
|
): CachedUttClientV2 = {
|
||||||
|
// Save 2 ^ 18 UTTs. Promising 100% cache rate
|
||||||
|
val defaultCacheConfigV2: CacheConfigV2 = CacheConfigV2(262143)
|
||||||
|
|
||||||
|
val uttClientCacheConfigsV2: UttClientCacheConfigsV2 = UttClientCacheConfigsV2(
|
||||||
|
getTaxonomyConfig = defaultCacheConfigV2,
|
||||||
|
getUttTaxonomyConfig = defaultCacheConfigV2,
|
||||||
|
getLeafIds = defaultCacheConfigV2,
|
||||||
|
getLeafUttEntities = defaultCacheConfigV2
|
||||||
|
)
|
||||||
|
|
||||||
|
// CachedUttClient to use StratoClient
|
||||||
|
new CachedUttClientV2(
|
||||||
|
stratoClient = stratoClient,
|
||||||
|
env = Environment.Prod,
|
||||||
|
cacheConfigs = uttClientCacheConfigsV2,
|
||||||
|
statsReceiver = statsReceiver.scope("cached_utt_client")
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,16 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"content-recommender/server/src/main/scala/com/twitter/contentrecommender:representation-manager-deps",
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util",
|
||||||
|
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/common",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/stores",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/summingbird/stores",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
"storage/clients/manhattan/client/src/main/scala",
|
||||||
|
"tweetypie/src/scala/com/twitter/tweetypie/util",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,39 @@
|
|||||||
|
package com.twitter.representation_manager.store
|
||||||
|
|
||||||
|
import com.twitter.servo.decider.DeciderKeyEnum
|
||||||
|
|
||||||
|
object DeciderConstants {
|
||||||
|
// Deciders inherited from CR and RSX and only used in LegacyRMS
|
||||||
|
// Their value are manipulated by CR and RSX's yml file and their decider dashboard
|
||||||
|
// We will remove them after migration completed
|
||||||
|
val enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore =
|
||||||
|
"enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore"
|
||||||
|
|
||||||
|
val enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore =
|
||||||
|
"enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore"
|
||||||
|
|
||||||
|
val enablelogFavBased20M145K2020TweetEmbeddingStoreTimeouts =
|
||||||
|
"enable_log_fav_based_tweet_embedding_20m145k2020_timeouts"
|
||||||
|
val logFavBased20M145K2020TweetEmbeddingStoreTimeoutValueMillis =
|
||||||
|
"log_fav_based_tweet_embedding_20m145k2020_timeout_value_millis"
|
||||||
|
|
||||||
|
val enablelogFavBased20M145KUpdatedTweetEmbeddingStoreTimeouts =
|
||||||
|
"enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts"
|
||||||
|
val logFavBased20M145KUpdatedTweetEmbeddingStoreTimeoutValueMillis =
|
||||||
|
"log_fav_based_tweet_embedding_20m145kUpdated_timeout_value_millis"
|
||||||
|
|
||||||
|
val enableSimClustersEmbeddingStoreTimeouts = "enable_sim_clusters_embedding_store_timeouts"
|
||||||
|
val simClustersEmbeddingStoreTimeoutValueMillis =
|
||||||
|
"sim_clusters_embedding_store_timeout_value_millis"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Necessary for using servo Gates
|
||||||
|
object DeciderKey extends DeciderKeyEnum {
|
||||||
|
val enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore: Value = Value(
|
||||||
|
DeciderConstants.enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore
|
||||||
|
)
|
||||||
|
|
||||||
|
val enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore: Value = Value(
|
||||||
|
DeciderConstants.enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,198 @@
|
|||||||
|
package com.twitter.representation_manager.store
|
||||||
|
|
||||||
|
import com.twitter.contentrecommender.store.ApeEntityEmbeddingStore
|
||||||
|
import com.twitter.contentrecommender.store.InterestsOptOutStore
|
||||||
|
import com.twitter.contentrecommender.store.SemanticCoreTopicSeedStore
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.escherbird.util.uttclient.CachedUttClientV2
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.frigate.common.store.strato.StratoFetchableStore
|
||||||
|
import com.twitter.frigate.common.util.SeqLongInjection
|
||||||
|
import com.twitter.hermit.store.common.ObservedCachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.interests.thriftscala.InterestsThriftService
|
||||||
|
import com.twitter.representation_manager.common.MemCacheConfig
|
||||||
|
import com.twitter.representation_manager.common.RepresentationManagerDecider
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.TopicId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.LocaleEntityId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import com.twitter.tweetypie.util.UserId
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class TopicSimClustersEmbeddingStore @Inject() (
|
||||||
|
stratoClient: StratoClient,
|
||||||
|
cacheClient: Client,
|
||||||
|
globalStats: StatsReceiver,
|
||||||
|
mhMtlsParams: ManhattanKVClientMtlsParams,
|
||||||
|
rmsDecider: RepresentationManagerDecider,
|
||||||
|
interestService: InterestsThriftService.MethodPerEndpoint,
|
||||||
|
uttClient: CachedUttClientV2) {
|
||||||
|
|
||||||
|
private val stats = globalStats.scope(this.getClass.getSimpleName)
|
||||||
|
private val interestsOptOutStore = InterestsOptOutStore(interestService)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Note this is NOT an embedding store. It is a list of author account ids we use to represent
|
||||||
|
* topics
|
||||||
|
*/
|
||||||
|
private val semanticCoreTopicSeedStore: ReadableStore[
|
||||||
|
SemanticCoreTopicSeedStore.Key,
|
||||||
|
Seq[UserId]
|
||||||
|
] = {
|
||||||
|
/*
|
||||||
|
Up to 1000 Long seeds per topic/language = 62.5kb per topic/language (worst case)
|
||||||
|
Assume ~10k active topic/languages ~= 650MB (worst case)
|
||||||
|
*/
|
||||||
|
val underlying = new SemanticCoreTopicSeedStore(uttClient, interestsOptOutStore)(
|
||||||
|
stats.scope("semantic_core_topic_seed_store"))
|
||||||
|
|
||||||
|
val memcacheStore = ObservedMemcachedReadableStore.fromCacheClient(
|
||||||
|
backingStore = underlying,
|
||||||
|
cacheClient = cacheClient,
|
||||||
|
ttl = 12.hours)(
|
||||||
|
valueInjection = SeqLongInjection,
|
||||||
|
statsReceiver = stats.scope("topic_producer_seed_store_mem_cache"),
|
||||||
|
keyToString = { k => s"tpss:${k.entityId}_${k.languageCode}" }
|
||||||
|
)
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[SemanticCoreTopicSeedStore.Key, Seq[UserId]](
|
||||||
|
store = memcacheStore,
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 20e3.toInt,
|
||||||
|
cacheName = "topic_producer_seed_store_cache",
|
||||||
|
windowSize = 5000
|
||||||
|
)(stats.scope("topic_producer_seed_store_cache"))
|
||||||
|
}
|
||||||
|
|
||||||
|
private val favBasedTfgTopicEmbedding20m145k2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/favBasedTFGTopic20M145K2020").mapValues(
|
||||||
|
embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift)
|
||||||
|
.composeKeyMapping[LocaleEntityId] { localeEntityId =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
FavTfgTopic,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.LocaleEntityId(localeEntityId))
|
||||||
|
}
|
||||||
|
|
||||||
|
buildLocaleEntityIdMemCacheStore(rawStore, FavTfgTopic, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedApeEntity20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val apeStore = StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020")
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50))
|
||||||
|
.composeKeyMapping[UserId]({ id =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
AggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.UserId(id))
|
||||||
|
})
|
||||||
|
val rawStore = new ApeEntityEmbeddingStore(
|
||||||
|
semanticCoreSeedStore = semanticCoreTopicSeedStore,
|
||||||
|
aggregatableProducerEmbeddingStore = apeStore,
|
||||||
|
statsReceiver = stats.scope("log_fav_based_ape_entity_2020_embedding_store"))
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding.toThrift, truncate = 50).toThrift)
|
||||||
|
.composeKeyMapping[TopicId] { topicId =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
LogFavBasedKgoApeTopic,
|
||||||
|
Model20m145k2020,
|
||||||
|
InternalId.TopicId(topicId))
|
||||||
|
}
|
||||||
|
|
||||||
|
buildTopicIdMemCacheStore(rawStore, LogFavBasedKgoApeTopic, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildTopicIdMemCacheStore(
|
||||||
|
rawStore: ReadableStore[TopicId, ThriftSimClustersEmbedding],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val observedStore: ObservedReadableStore[TopicId, ThriftSimClustersEmbedding] =
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.TopicId(topicId)) =>
|
||||||
|
topicId
|
||||||
|
}
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
storeWithKeyMapping,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildLocaleEntityIdMemCacheStore(
|
||||||
|
rawStore: ReadableStore[LocaleEntityId, ThriftSimClustersEmbedding],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val observedStore: ObservedReadableStore[LocaleEntityId, ThriftSimClustersEmbedding] =
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.LocaleEntityId(localeEntityId)) =>
|
||||||
|
localeEntityId
|
||||||
|
}
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
storeWithKeyMapping,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val underlyingStores: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding]
|
||||||
|
] = Map(
|
||||||
|
// Topic Embeddings
|
||||||
|
(FavTfgTopic, Model20m145k2020) -> favBasedTfgTopicEmbedding20m145k2020Store,
|
||||||
|
(LogFavBasedKgoApeTopic, Model20m145k2020) -> logFavBasedApeEntity20M145K2020EmbeddingStore,
|
||||||
|
)
|
||||||
|
|
||||||
|
val topicSimClustersEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
SimClustersEmbeddingStore.buildWithDecider(
|
||||||
|
underlyingStores = underlyingStores,
|
||||||
|
decider = rmsDecider.decider,
|
||||||
|
statsReceiver = stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,141 @@
|
|||||||
|
package com.twitter.representation_manager.store
|
||||||
|
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.representation_manager.common.MemCacheConfig
|
||||||
|
import com.twitter.representation_manager.common.RepresentationManagerDecider
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.common.TweetId
|
||||||
|
import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.PersistentTweetEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class TweetSimClustersEmbeddingStore @Inject() (
|
||||||
|
cacheClient: Client,
|
||||||
|
globalStats: StatsReceiver,
|
||||||
|
mhMtlsParams: ManhattanKVClientMtlsParams,
|
||||||
|
rmsDecider: RepresentationManagerDecider) {
|
||||||
|
|
||||||
|
private val stats = globalStats.scope(this.getClass.getSimpleName)
|
||||||
|
|
||||||
|
val logFavBasedLongestL2Tweet20M145KUpdatedEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.longestL2NormTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset,
|
||||||
|
stats
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, LogFavLongestL2EmbeddingTweet, Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
val logFavBasedLongestL2Tweet20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.longestL2NormTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset,
|
||||||
|
stats
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, LogFavLongestL2EmbeddingTweet, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
val logFavBased20M145KUpdatedTweetEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.mostRecentTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145kUpdatedDataset,
|
||||||
|
stats
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, LogFavBasedTweet, Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
val logFavBased20M145K2020TweetEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
PersistentTweetEmbeddingStore
|
||||||
|
.mostRecentTweetEmbeddingStoreManhattan(
|
||||||
|
mhMtlsParams,
|
||||||
|
PersistentTweetEmbeddingStore.LogFavBased20m145k2020Dataset,
|
||||||
|
stats
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, LogFavBasedTweet, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildMemCacheStore(
|
||||||
|
rawStore: ReadableStore[TweetId, ThriftSimClustersEmbedding],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val observedStore: ObservedReadableStore[TweetId, ThriftSimClustersEmbedding] =
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
val storeWithKeyMapping = observedStore.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.TweetId(tweetId)) =>
|
||||||
|
tweetId
|
||||||
|
}
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
storeWithKeyMapping,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val underlyingStores: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding]
|
||||||
|
] = Map(
|
||||||
|
// Tweet Embeddings
|
||||||
|
(LogFavBasedTweet, Model20m145kUpdated) -> logFavBased20M145KUpdatedTweetEmbeddingStore,
|
||||||
|
(LogFavBasedTweet, Model20m145k2020) -> logFavBased20M145K2020TweetEmbeddingStore,
|
||||||
|
(
|
||||||
|
LogFavLongestL2EmbeddingTweet,
|
||||||
|
Model20m145kUpdated) -> logFavBasedLongestL2Tweet20M145KUpdatedEmbeddingStore,
|
||||||
|
(
|
||||||
|
LogFavLongestL2EmbeddingTweet,
|
||||||
|
Model20m145k2020) -> logFavBasedLongestL2Tweet20M145K2020EmbeddingStore,
|
||||||
|
)
|
||||||
|
|
||||||
|
val tweetSimClustersEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
SimClustersEmbeddingStore.buildWithDecider(
|
||||||
|
underlyingStores = underlyingStores,
|
||||||
|
decider = rmsDecider.decider,
|
||||||
|
statsReceiver = stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,602 @@
|
|||||||
|
package com.twitter.representation_manager.store
|
||||||
|
|
||||||
|
import com.twitter.contentrecommender.twistly
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.frigate.common.store.strato.StratoFetchableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.representation_manager.common.MemCacheConfig
|
||||||
|
import com.twitter.representation_manager.common.RepresentationManagerDecider
|
||||||
|
import com.twitter.simclusters_v2.common.ModelVersions
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.ProducerClusterEmbeddingReadableStores
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.getStore
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.modelVersionToDatasetMap
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.knownModelVersions
|
||||||
|
import com.twitter.simclusters_v2.summingbird.stores.UserInterestedInReadableStore.toSimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ClustersUserIsInterestedIn
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{SimClustersEmbedding => ThriftSimClustersEmbedding}
|
||||||
|
import com.twitter.storage.client.manhattan.kv.ManhattanKVClientMtlsParams
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.storehaus_internal.manhattan.Apollo
|
||||||
|
import com.twitter.storehaus_internal.manhattan.ManhattanCluster
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import com.twitter.strato.thrift.ScroogeConvImplicits._
|
||||||
|
import com.twitter.tweetypie.util.UserId
|
||||||
|
import com.twitter.util.Future
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class UserSimClustersEmbeddingStore @Inject() (
|
||||||
|
stratoClient: StratoClient,
|
||||||
|
cacheClient: Client,
|
||||||
|
globalStats: StatsReceiver,
|
||||||
|
mhMtlsParams: ManhattanKVClientMtlsParams,
|
||||||
|
rmsDecider: RepresentationManagerDecider) {
|
||||||
|
|
||||||
|
private val stats = globalStats.scope(this.getClass.getSimpleName)
|
||||||
|
|
||||||
|
private val favBasedProducer20M145KUpdatedEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = ProducerClusterEmbeddingReadableStores
|
||||||
|
.getProducerTopKSimClustersEmbeddingsStore(
|
||||||
|
mhMtlsParams
|
||||||
|
).mapValues { topSimClustersWithScore =>
|
||||||
|
ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters)
|
||||||
|
}.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) =>
|
||||||
|
userId
|
||||||
|
}
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, FavBasedProducer, Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val favBasedProducer20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = ProducerClusterEmbeddingReadableStores
|
||||||
|
.getProducerTopKSimClusters2020EmbeddingsStore(
|
||||||
|
mhMtlsParams
|
||||||
|
).mapValues { topSimClustersWithScore =>
|
||||||
|
ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters)
|
||||||
|
}.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) =>
|
||||||
|
userId
|
||||||
|
}
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, FavBasedProducer, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val followBasedProducer20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = ProducerClusterEmbeddingReadableStores
|
||||||
|
.getProducerTopKSimClustersEmbeddingsByFollowStore(
|
||||||
|
mhMtlsParams
|
||||||
|
).mapValues { topSimClustersWithScore =>
|
||||||
|
ThriftSimClustersEmbedding(topSimClustersWithScore.topClusters)
|
||||||
|
}.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(_, _, InternalId.UserId(userId)) =>
|
||||||
|
userId
|
||||||
|
}
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, FollowBasedProducer, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedApe20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/logFavBasedAPE20M145K2020")
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift)
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, AggregatableLogFavBasedProducer, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
ThriftSimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
StratoFetchableStore
|
||||||
|
.withUnitView[SimClustersEmbeddingId, ThriftSimClustersEmbedding](
|
||||||
|
stratoClient,
|
||||||
|
"recommendations/simclusters_v2/embeddings/logFavBasedAPERelaxedFavEngagementThreshold20M145K2020")
|
||||||
|
.mapValues(embedding => SimClustersEmbedding(embedding, truncate = 50).toThrift)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val relaxedLogFavBasedApe20M145K2020EmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(
|
||||||
|
rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore,
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val relaxedLogFavBasedApe20m145kUpdatedEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = rawRelaxedLogFavBasedApe20M145K2020EmbeddingStore
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145kUpdated,
|
||||||
|
internalId) =>
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020,
|
||||||
|
internalId)
|
||||||
|
}
|
||||||
|
|
||||||
|
buildMemCacheStore(rawStore, RelaxedAggregatableLogFavBasedProducer, Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedInterestedInFromAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultIIAPESimClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val followBasedInterestedInFromAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultIIAPESimClustersEmbeddingStoreWithMtls,
|
||||||
|
FollowBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val favBasedUserInterestedIn20M145KUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls,
|
||||||
|
FavBasedUserInterestedIn,
|
||||||
|
Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val favBasedUserInterestedIn20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls,
|
||||||
|
FavBasedUserInterestedIn,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val followBasedUserInterestedIn20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls,
|
||||||
|
FollowBasedUserInterestedIn,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedUserInterestedIn20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultSimClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedIn,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val favBasedUserInterestedInFromPE20M145KUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultIIPESimClustersEmbeddingStoreWithMtls,
|
||||||
|
FavBasedUserInterestedInFromPE,
|
||||||
|
Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val twistlyUserInterestedInStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
ThriftSimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val interestedIn20M145KUpdatedStore = {
|
||||||
|
UserInterestedInReadableStore.defaultStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145KUpdated
|
||||||
|
)
|
||||||
|
}
|
||||||
|
val interestedIn20M145K2020Store = {
|
||||||
|
UserInterestedInReadableStore.defaultStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145K2020
|
||||||
|
)
|
||||||
|
}
|
||||||
|
val interestedInFromPE20M145KUpdatedStore = {
|
||||||
|
UserInterestedInReadableStore.defaultIIPEStoreWithMtls(
|
||||||
|
mhMtlsParams,
|
||||||
|
modelVersion = ModelVersions.Model20M145KUpdated)
|
||||||
|
}
|
||||||
|
val simClustersInterestedInStore: ReadableStore[
|
||||||
|
(UserId, ModelVersion),
|
||||||
|
ClustersUserIsInterestedIn
|
||||||
|
] = {
|
||||||
|
new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] {
|
||||||
|
override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = {
|
||||||
|
k match {
|
||||||
|
case (userId, Model20m145kUpdated) =>
|
||||||
|
interestedIn20M145KUpdatedStore.get(userId)
|
||||||
|
case (userId, Model20m145k2020) =>
|
||||||
|
interestedIn20M145K2020Store.get(userId)
|
||||||
|
case _ =>
|
||||||
|
Future.None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
val simClustersInterestedInFromProducerEmbeddingsStore: ReadableStore[
|
||||||
|
(UserId, ModelVersion),
|
||||||
|
ClustersUserIsInterestedIn
|
||||||
|
] = {
|
||||||
|
new ReadableStore[(UserId, ModelVersion), ClustersUserIsInterestedIn] {
|
||||||
|
override def get(k: (UserId, ModelVersion)): Future[Option[ClustersUserIsInterestedIn]] = {
|
||||||
|
k match {
|
||||||
|
case (userId, ModelVersion.Model20m145kUpdated) =>
|
||||||
|
interestedInFromPE20M145KUpdatedStore.get(userId)
|
||||||
|
case _ =>
|
||||||
|
Future.None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
new twistly.interestedin.EmbeddingStore(
|
||||||
|
interestedInStore = simClustersInterestedInStore,
|
||||||
|
interestedInFromProducerEmbeddingStore = simClustersInterestedInFromProducerEmbeddingsStore,
|
||||||
|
statsReceiver = stats
|
||||||
|
).mapValues(_.toThrift)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val userNextInterestedIn20m145k2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildUserInterestedInStore(
|
||||||
|
UserInterestedInReadableStore.defaultNextInterestedInStoreWithMtls,
|
||||||
|
UserNextInterestedIn,
|
||||||
|
Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val filteredUserInterestedIn20m145kUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(twistlyUserInterestedInStore, FilteredUserInterestedIn, Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val filteredUserInterestedIn20m145k2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(twistlyUserInterestedInStore, FilteredUserInterestedIn, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val filteredUserInterestedInFromPE20m145kUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(
|
||||||
|
twistlyUserInterestedInStore,
|
||||||
|
FilteredUserInterestedInFromPE,
|
||||||
|
Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val unfilteredUserInterestedIn20m145kUpdatedStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(
|
||||||
|
twistlyUserInterestedInStore,
|
||||||
|
UnfilteredUserInterestedIn,
|
||||||
|
Model20m145kUpdated)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val unfilteredUserInterestedIn20m145k2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
buildMemCacheStore(twistlyUserInterestedInStore, UnfilteredUserInterestedIn, Model20m145k2020)
|
||||||
|
}
|
||||||
|
|
||||||
|
// [Experimental] User InterestedIn, generated by aggregating IIAPE embedding from AddressBook
|
||||||
|
|
||||||
|
private val logFavBasedInterestedMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_maxpooling"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedInterestedAverageAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_average"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedAverageAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_booktype_maxpooling"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_largestdim_maxpooling"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_louvain_maxpooling"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val logFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE20M145K2020Store: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val datasetName = "addressbook_sims_embedding_iiape_connected_maxpooling"
|
||||||
|
val appId = "wtf_embedding_apollo"
|
||||||
|
buildUserInterestedInStoreGeneric(
|
||||||
|
simClustersEmbeddingStoreWithMtls,
|
||||||
|
LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020,
|
||||||
|
datasetName = datasetName,
|
||||||
|
appId = appId,
|
||||||
|
manhattanCluster = Apollo
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Helper func to build a readable store for some UserInterestedIn embeddings with
|
||||||
|
* 1. A storeFunc from UserInterestedInReadableStore
|
||||||
|
* 2. EmbeddingType
|
||||||
|
* 3. ModelVersion
|
||||||
|
* 4. MemCacheConfig
|
||||||
|
* */
|
||||||
|
private def buildUserInterestedInStore(
|
||||||
|
storeFunc: (ManhattanKVClientMtlsParams, EmbeddingType, ModelVersion) => ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore = storeFunc(mhMtlsParams, embeddingType, modelVersion)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
val observedStore = ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
observedStore,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildUserInterestedInStoreGeneric(
|
||||||
|
storeFunc: (ManhattanKVClientMtlsParams, EmbeddingType, ModelVersion, String, String,
|
||||||
|
ManhattanCluster) => ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion,
|
||||||
|
datasetName: String,
|
||||||
|
appId: String,
|
||||||
|
manhattanCluster: ManhattanCluster
|
||||||
|
): ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
val rawStore =
|
||||||
|
storeFunc(mhMtlsParams, embeddingType, modelVersion, datasetName, appId, manhattanCluster)
|
||||||
|
.mapValues(_.toThrift)
|
||||||
|
val observedStore = ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
observedStore,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def simClustersEmbeddingStoreWithMtls(
|
||||||
|
mhMtlsParams: ManhattanKVClientMtlsParams,
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion,
|
||||||
|
datasetName: String,
|
||||||
|
appId: String,
|
||||||
|
manhattanCluster: ManhattanCluster
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
|
||||||
|
if (!modelVersionToDatasetMap.contains(ModelVersions.toKnownForModelVersion(modelVersion))) {
|
||||||
|
throw new IllegalArgumentException(
|
||||||
|
"Unknown model version: " + modelVersion + ". Known model versions: " + knownModelVersions)
|
||||||
|
}
|
||||||
|
getStore(appId, mhMtlsParams, datasetName, manhattanCluster)
|
||||||
|
.composeKeyMapping[SimClustersEmbeddingId] {
|
||||||
|
case SimClustersEmbeddingId(theEmbeddingType, theModelVersion, InternalId.UserId(userId))
|
||||||
|
if theEmbeddingType == embeddingType && theModelVersion == modelVersion =>
|
||||||
|
userId
|
||||||
|
}.mapValues(toSimClustersEmbedding(_, embeddingType))
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildMemCacheStore(
|
||||||
|
rawStore: ReadableStore[SimClustersEmbeddingId, ThriftSimClustersEmbedding],
|
||||||
|
embeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val observedStore = ObservedReadableStore(
|
||||||
|
store = rawStore
|
||||||
|
)(stats.scope(embeddingType.name).scope(modelVersion.name))
|
||||||
|
|
||||||
|
MemCacheConfig.buildMemCacheStoreForSimClustersEmbedding(
|
||||||
|
observedStore,
|
||||||
|
cacheClient,
|
||||||
|
embeddingType,
|
||||||
|
modelVersion,
|
||||||
|
stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private val underlyingStores: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding]
|
||||||
|
] = Map(
|
||||||
|
// KnownFor Embeddings
|
||||||
|
(FavBasedProducer, Model20m145kUpdated) -> favBasedProducer20M145KUpdatedEmbeddingStore,
|
||||||
|
(FavBasedProducer, Model20m145k2020) -> favBasedProducer20M145K2020EmbeddingStore,
|
||||||
|
(FollowBasedProducer, Model20m145k2020) -> followBasedProducer20M145K2020EmbeddingStore,
|
||||||
|
(AggregatableLogFavBasedProducer, Model20m145k2020) -> logFavBasedApe20M145K2020EmbeddingStore,
|
||||||
|
(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145kUpdated) -> relaxedLogFavBasedApe20m145kUpdatedEmbeddingStore,
|
||||||
|
(
|
||||||
|
RelaxedAggregatableLogFavBasedProducer,
|
||||||
|
Model20m145k2020) -> relaxedLogFavBasedApe20M145K2020EmbeddingStore,
|
||||||
|
// InterestedIn Embeddings
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedInterestedInFromAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
FollowBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020) -> followBasedInterestedInFromAPE20M145K2020Store,
|
||||||
|
(FavBasedUserInterestedIn, Model20m145kUpdated) -> favBasedUserInterestedIn20M145KUpdatedStore,
|
||||||
|
(FavBasedUserInterestedIn, Model20m145k2020) -> favBasedUserInterestedIn20M145K2020Store,
|
||||||
|
(FollowBasedUserInterestedIn, Model20m145k2020) -> followBasedUserInterestedIn20M145K2020Store,
|
||||||
|
(LogFavBasedUserInterestedIn, Model20m145k2020) -> logFavBasedUserInterestedIn20M145K2020Store,
|
||||||
|
(
|
||||||
|
FavBasedUserInterestedInFromPE,
|
||||||
|
Model20m145kUpdated) -> favBasedUserInterestedInFromPE20M145KUpdatedStore,
|
||||||
|
(FilteredUserInterestedIn, Model20m145kUpdated) -> filteredUserInterestedIn20m145kUpdatedStore,
|
||||||
|
(FilteredUserInterestedIn, Model20m145k2020) -> filteredUserInterestedIn20m145k2020Store,
|
||||||
|
(
|
||||||
|
FilteredUserInterestedInFromPE,
|
||||||
|
Model20m145kUpdated) -> filteredUserInterestedInFromPE20m145kUpdatedStore,
|
||||||
|
(
|
||||||
|
UnfilteredUserInterestedIn,
|
||||||
|
Model20m145kUpdated) -> unfilteredUserInterestedIn20m145kUpdatedStore,
|
||||||
|
(UnfilteredUserInterestedIn, Model20m145k2020) -> unfilteredUserInterestedIn20m145k2020Store,
|
||||||
|
(UserNextInterestedIn, Model20m145k2020) -> userNextInterestedIn20m145k2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedInterestedMaxpoolingAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedAverageAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedInterestedAverageAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedUserInterestedBooktypeMaxpoolingAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedUserInterestedLargestDimMaxpoolingAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedUserInterestedLouvainMaxpoolingAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE,
|
||||||
|
Model20m145k2020) -> logFavBasedUserInterestedConnectedMaxpoolingAddressBookFromIIAPE20M145K2020Store,
|
||||||
|
)
|
||||||
|
|
||||||
|
val userSimClustersEmbeddingStore: ReadableStore[
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
SimClustersEmbedding
|
||||||
|
] = {
|
||||||
|
SimClustersEmbeddingStore.buildWithDecider(
|
||||||
|
underlyingStores = underlyingStores,
|
||||||
|
decider = rmsDecider.decider,
|
||||||
|
statsReceiver = stats
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
18
representation-manager/server/src/main/thrift/BUILD
Normal file
18
representation-manager/server/src/main/thrift/BUILD
Normal file
@ -0,0 +1,18 @@
|
|||||||
|
create_thrift_libraries(
|
||||||
|
base_name = "thrift",
|
||||||
|
sources = [
|
||||||
|
"com/twitter/representation_manager/service.thrift",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
tags = [
|
||||||
|
"bazel-compatible",
|
||||||
|
],
|
||||||
|
dependency_roots = [
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift",
|
||||||
|
],
|
||||||
|
generate_languages = [
|
||||||
|
"java",
|
||||||
|
"scala",
|
||||||
|
"strato",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,14 @@
|
|||||||
|
namespace java com.twitter.representation_manager.thriftjava
|
||||||
|
#@namespace scala com.twitter.representation_manager.thriftscala
|
||||||
|
#@namespace strato com.twitter.representation_manager
|
||||||
|
|
||||||
|
include "com/twitter/simclusters_v2/online_store.thrift"
|
||||||
|
include "com/twitter/simclusters_v2/identifier.thrift"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A uniform column view for all kinds of SimClusters based embeddings.
|
||||||
|
**/
|
||||||
|
struct SimClustersEmbeddingView {
|
||||||
|
1: required identifier.EmbeddingType embeddingType
|
||||||
|
2: required online_store.ModelVersion modelVersion
|
||||||
|
}(persisted = 'false', hasPersonalData = 'false')
|
1
representation-scorer/BUILD.bazel
Normal file
1
representation-scorer/BUILD.bazel
Normal file
@ -0,0 +1 @@
|
|||||||
|
# This prevents SQ query from grabbing //:all since it traverses up once to find a BUILD
|
5
representation-scorer/README.md
Normal file
5
representation-scorer/README.md
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
# Representation Scorer #
|
||||||
|
|
||||||
|
**Representation Scorer** (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features.
|
||||||
|
|
||||||
|
The Representation Scorer acquires user behavior data from the User Signal Service (USS) and extracts embeddings from the Representation Manager (RMS). It then calculates both pairwise and listwise features. These features are used at various stages, including candidate retrieval and ranking.
|
8
representation-scorer/bin/canary-check.sh
Executable file
8
representation-scorer/bin/canary-check.sh
Executable file
@ -0,0 +1,8 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
export CANARY_CHECK_ROLE="representation-scorer"
|
||||||
|
export CANARY_CHECK_NAME="representation-scorer"
|
||||||
|
export CANARY_CHECK_INSTANCES="0-19"
|
||||||
|
|
||||||
|
python3 relevance-platform/tools/canary_check.py "$@"
|
||||||
|
|
4
representation-scorer/bin/deploy.sh
Executable file
4
representation-scorer/bin/deploy.sh
Executable file
@ -0,0 +1,4 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
JOB=representation-scorer bazel run --ui_event_filters=-info,-stdout,-stderr --noshow_progress \
|
||||||
|
//relevance-platform/src/main/python/deploy -- "$@"
|
66
representation-scorer/bin/remote-debug-tunnel.sh
Executable file
66
representation-scorer/bin/remote-debug-tunnel.sh
Executable file
@ -0,0 +1,66 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -o nounset
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
DC="atla"
|
||||||
|
ROLE="$USER"
|
||||||
|
SERVICE="representation-scorer"
|
||||||
|
INSTANCE="0"
|
||||||
|
KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE"
|
||||||
|
|
||||||
|
while test $# -gt 0; do
|
||||||
|
case "$1" in
|
||||||
|
-h|--help)
|
||||||
|
echo "$0 Set up an ssh tunnel for $SERVICE remote debugging and disable aurora health checks"
|
||||||
|
echo " "
|
||||||
|
echo "See representation-scorer/README.md for details of how to use this script, and go/remote-debug for"
|
||||||
|
echo "general information about remote debugging in Aurora"
|
||||||
|
echo " "
|
||||||
|
echo "Default instance if called with no args:"
|
||||||
|
echo " $KEY"
|
||||||
|
echo " "
|
||||||
|
echo "Positional args:"
|
||||||
|
echo " $0 [datacentre] [role] [service_name] [instance]"
|
||||||
|
echo " "
|
||||||
|
echo "Options:"
|
||||||
|
echo " -h, --help show brief help"
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
break
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
if [ -n "${1-}" ]; then
|
||||||
|
DC="$1"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -n "${2-}" ]; then
|
||||||
|
ROLE="$2"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -n "${3-}" ]; then
|
||||||
|
SERVICE="$3"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -n "${4-}" ]; then
|
||||||
|
INSTANCE="$4"
|
||||||
|
fi
|
||||||
|
|
||||||
|
KEY="$DC/$ROLE/devel/$SERVICE/$INSTANCE"
|
||||||
|
read -p "Set up remote debugger tunnel for $KEY? (y/n) " -r CONFIRM
|
||||||
|
if [[ ! $CONFIRM =~ ^[Yy]$ ]]; then
|
||||||
|
echo "Exiting, tunnel not created"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Disabling health check and opening tunnel. Exit with control-c when you're finished"
|
||||||
|
CMD="aurora task ssh $KEY -c 'touch .healthchecksnooze' && aurora task ssh $KEY -L '5005:debug' --ssh-options '-N -S none -v '"
|
||||||
|
|
||||||
|
echo "Running $CMD"
|
||||||
|
eval "$CMD"
|
||||||
|
|
||||||
|
|
||||||
|
|
39
representation-scorer/docs/index.rst
Normal file
39
representation-scorer/docs/index.rst
Normal file
@ -0,0 +1,39 @@
|
|||||||
|
Representation Scorer (RSX)
|
||||||
|
###########################
|
||||||
|
|
||||||
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
|
Representation Scorer (RSX) is a StratoFed service which serves scores for pairs of entities (User, Tweet, Topic...) based on some representation of those entities. For example, it serves User-Tweet scores based on the cosine similarity of SimClusters embeddings for each of these. It aims to provide these with low latency and at high scale, to support applications such as scoring for ANN candidate generation and feature hydration via feature store.
|
||||||
|
|
||||||
|
|
||||||
|
Current use cases
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
RSX currently serves traffic for the following use cases:
|
||||||
|
|
||||||
|
- User-Tweet similarity scores for Home ranking, using SimClusters embedding dot product
|
||||||
|
- Topic-Tweet similarity scores for topical tweet candidate generation and topic social proof, using SimClusters embedding cosine similarity and CERTO scores
|
||||||
|
- Tweet-Tweet and User-Tweet similarity scores for ANN candidate generation, using SimClusters embedding cosine similarity
|
||||||
|
- (in development) User-Tweet similarity scores for Home ranking, based on various aggregations of similarities with recent faves, retweets and follows performed by the user
|
||||||
|
|
||||||
|
Getting Started
|
||||||
|
===============
|
||||||
|
|
||||||
|
Fetching scores
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Scores are served from the recommendations/representation_scorer/score column.
|
||||||
|
|
||||||
|
Using RSX for your application
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
RSX may be a good fit for your application if you need scores based on combinations of SimCluster embeddings for core nouns. We also plan to support other embeddings and scoring approaches in the future.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
index
|
||||||
|
|
||||||
|
|
22
representation-scorer/server/BUILD
Normal file
22
representation-scorer/server/BUILD
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
jvm_binary(
|
||||||
|
name = "bin",
|
||||||
|
basename = "representation-scorer",
|
||||||
|
main = "com.twitter.representationscorer.RepresentationScorerFedServerMain",
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finatra/inject/inject-logback/src/main/scala",
|
||||||
|
"loglens/loglens-logback/src/main/scala/com/twitter/loglens/logback",
|
||||||
|
"representation-scorer/server/src/main/resources",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer",
|
||||||
|
"twitter-server/logback-classic/src/main/scala",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
# Aurora Workflows build phase convention requires a jvm_app named with ${project-name}-app
|
||||||
|
jvm_app(
|
||||||
|
name = "representation-scorer-app",
|
||||||
|
archive = "zip",
|
||||||
|
binary = ":bin",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
)
|
9
representation-scorer/server/src/main/resources/BUILD
Normal file
9
representation-scorer/server/src/main/resources/BUILD
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
resources(
|
||||||
|
sources = [
|
||||||
|
"*.xml",
|
||||||
|
"*.yml",
|
||||||
|
"com/twitter/slo/slo.json",
|
||||||
|
"config/*.yml",
|
||||||
|
],
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
)
|
@ -0,0 +1,55 @@
|
|||||||
|
{
|
||||||
|
"servers": [
|
||||||
|
{
|
||||||
|
"name": "strato",
|
||||||
|
"indicators": [
|
||||||
|
{
|
||||||
|
"id": "success_rate_3m",
|
||||||
|
"indicator_type": "SuccessRateIndicator",
|
||||||
|
"duration": 3,
|
||||||
|
"duration_unit": "MINUTES"
|
||||||
|
}, {
|
||||||
|
"id": "latency_3m_p99",
|
||||||
|
"indicator_type": "LatencyIndicator",
|
||||||
|
"duration": 3,
|
||||||
|
"duration_unit": "MINUTES",
|
||||||
|
"percentile": 0.99
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"objectives": [
|
||||||
|
{
|
||||||
|
"indicator": "success_rate_3m",
|
||||||
|
"objective_type": "SuccessRateObjective",
|
||||||
|
"operator": ">=",
|
||||||
|
"threshold": 0.995
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"indicator": "latency_3m_p99",
|
||||||
|
"objective_type": "LatencyObjective",
|
||||||
|
"operator": "<=",
|
||||||
|
"threshold": 50
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"long_term_objectives": [
|
||||||
|
{
|
||||||
|
"id": "success_rate_28_days",
|
||||||
|
"objective_type": "SuccessRateObjective",
|
||||||
|
"operator": ">=",
|
||||||
|
"threshold": 0.993,
|
||||||
|
"duration": 28,
|
||||||
|
"duration_unit": "DAYS"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "latency_p99_28_days",
|
||||||
|
"objective_type": "LatencyObjective",
|
||||||
|
"operator": "<=",
|
||||||
|
"threshold": 60,
|
||||||
|
"duration": 28,
|
||||||
|
"duration_unit": "DAYS",
|
||||||
|
"percentile": 0.99
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"@version": 1
|
||||||
|
}
|
@ -0,0 +1,155 @@
|
|||||||
|
enableLogFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore:
|
||||||
|
comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145KUpdatedEmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests."
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
enableLogFavBasedApeEntity20M145K2020EmbeddingCachedStore:
|
||||||
|
comment: "Enable to use the non-empty store for logFavBasedApeEntity20M145K2020EmbeddingCachedStore (from 0% to 100%). 0 means use EMPTY readable store for all requests."
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation-scorer_forward_dark_traffic:
|
||||||
|
comment: "Defines the percentage of traffic to forward to diffy-proxy. Set to 0 to disable dark traffic forwarding"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_non_prod_callers":
|
||||||
|
comment: "Discard traffic from all non-prod callers"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
enable_log_fav_based_tweet_embedding_20m145k2020_timeouts:
|
||||||
|
comment: "If enabled, set a timeout on calls to the logFavBased20M145K2020TweetEmbeddingStore"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
log_fav_based_tweet_embedding_20m145k2020_timeout_value_millis:
|
||||||
|
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145K2020TweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145k2020_timeouts is true"
|
||||||
|
default_availability: 2000
|
||||||
|
|
||||||
|
enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts:
|
||||||
|
comment: "If enabled, set a timeout on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
log_fav_based_tweet_embedding_20m145kUpdated_timeout_value_millis:
|
||||||
|
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the logFavBased20M145KUpdatedTweetEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_log_fav_based_tweet_embedding_20m145kUpdated_timeouts is true"
|
||||||
|
default_availability: 2000
|
||||||
|
|
||||||
|
enable_cluster_tweet_index_store_timeouts:
|
||||||
|
comment: "If enabled, set a timeout on calls to the ClusterTweetIndexStore"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
cluster_tweet_index_store_timeout_value_millis:
|
||||||
|
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the ClusterTweetIndexStore, i.e. 1.50% is 150ms. Only applied if enable_cluster_tweet_index_store_timeouts is true"
|
||||||
|
default_availability: 2000
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_share:
|
||||||
|
comment: "If enabled, fetches share signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_reply:
|
||||||
|
comment: "If enabled, fetches reply signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_original_tweet:
|
||||||
|
comment: "If enabled, fetches original tweet signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_video_playback:
|
||||||
|
comment: "If enabled, fetches video playback signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_block:
|
||||||
|
comment: "If enabled, fetches account block signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_mute:
|
||||||
|
comment: "If enabled, fetches account mute signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_report:
|
||||||
|
comment: "If enabled, fetches tweet report signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_dont_like:
|
||||||
|
comment: "If enabled, fetches tweet don't like signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
representation_scorer_fetch_signal_see_fewer:
|
||||||
|
comment: "If enabled, fetches tweet see fewer signals from USS"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
# To create a new decider, add here with the same format and caller's details : "representation-scorer_load_shed_by_caller_id_twtr:{{role}}:{{name}}:{{environment}}:{{cluster}}"
|
||||||
|
# All the deciders below are generated by this script - ./strato/bin/fed deciders ./ --service-role=representation-scorer --service-name=representation-scorer
|
||||||
|
# If you need to run the script and paste the output, add only the prod deciders here. Non-prod ones are being taken care of by representation-scorer_load_shed_non_prod_callers
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_all":
|
||||||
|
comment: "Reject all traffic from caller id: all"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-canary:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice-send:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice-send:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:frigate:frigate-pushservice:staging:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:frigate:frigate-pushservice:staging:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:home-scorer:home-scorer:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:home-scorer:home-scorer:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoapi:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoapi:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:stratostore:stratoserver:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:stratostore:stratoserver:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:timelinescorer:timelinescorer:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:timelinescorer:timelinescorer:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:atla":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:atla"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"representation-scorer_load_shed_by_caller_id_twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa":
|
||||||
|
comment: "Reject all traffic from caller id: twtr:svc:topic-social-proof:topic-social-proof:prod:pdxa"
|
||||||
|
default_availability: 0
|
||||||
|
|
||||||
|
"enable_sim_clusters_embedding_store_timeouts":
|
||||||
|
comment: "If enabled, set a timeout on calls to the SimClustersEmbeddingStore"
|
||||||
|
default_availability: 10000
|
||||||
|
|
||||||
|
sim_clusters_embedding_store_timeout_value_millis:
|
||||||
|
comment: "The value of this decider defines the timeout (in milliseconds) to use on calls to the SimClustersEmbeddingStore, i.e. 1.50% is 150ms. Only applied if enable_sim_clusters_embedding_store_timeouts is true"
|
||||||
|
default_availability: 2000
|
165
representation-scorer/server/src/main/resources/logback.xml
Normal file
165
representation-scorer/server/src/main/resources/logback.xml
Normal file
@ -0,0 +1,165 @@
|
|||||||
|
<configuration>
|
||||||
|
<shutdownHook class="ch.qos.logback.core.hook.DelayingShutdownHook"/>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Service Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<property name="DEFAULT_SERVICE_PATTERN"
|
||||||
|
value="%-16X{traceId} %-12X{clientId:--} %-16X{method} %-25logger{0} %msg"/>
|
||||||
|
|
||||||
|
<property name="DEFAULT_ACCESS_PATTERN"
|
||||||
|
value="%msg"/>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Common Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<!-- JUL/JDK14 to Logback bridge -->
|
||||||
|
<contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator">
|
||||||
|
<resetJUL>true</resetJUL>
|
||||||
|
</contextListener>
|
||||||
|
|
||||||
|
<!-- ====================================================================================== -->
|
||||||
|
<!-- NOTE: The following appenders use a simple TimeBasedRollingPolicy configuration. -->
|
||||||
|
<!-- You may want to consider using a more advanced SizeAndTimeBasedRollingPolicy. -->
|
||||||
|
<!-- See: https://logback.qos.ch/manual/appenders.html#SizeAndTimeBasedRollingPolicy -->
|
||||||
|
<!-- ====================================================================================== -->
|
||||||
|
|
||||||
|
<!-- Service Log (rollover daily, keep maximum of 21 days of gzip compressed logs) -->
|
||||||
|
<appender name="SERVICE" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>${log.service.output}</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>${log.service.output}.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>3GB</totalSizeCap>
|
||||||
|
<!-- keep maximum 21 days' worth of history -->
|
||||||
|
<maxHistory>21</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- Access Log (rollover daily, keep maximum of 21 days of gzip compressed logs) -->
|
||||||
|
<appender name="ACCESS" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>${log.access.output}</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>${log.access.output}.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>100MB</totalSizeCap>
|
||||||
|
<!-- keep maximum 7 days' worth of history -->
|
||||||
|
<maxHistory>7</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>${DEFAULT_ACCESS_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!--LogLens -->
|
||||||
|
<appender name="LOGLENS" class="com.twitter.loglens.logback.LoglensAppender">
|
||||||
|
<mdcAdditionalContext>true</mdcAdditionalContext>
|
||||||
|
<category>${log.lens.category}</category>
|
||||||
|
<index>${log.lens.index}</index>
|
||||||
|
<tag>${log.lens.tag}/service</tag>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%msg</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- LogLens Access -->
|
||||||
|
<appender name="LOGLENS-ACCESS" class="com.twitter.loglens.logback.LoglensAppender">
|
||||||
|
<mdcAdditionalContext>true</mdcAdditionalContext>
|
||||||
|
<category>${log.lens.category}</category>
|
||||||
|
<index>${log.lens.index}</index>
|
||||||
|
<tag>${log.lens.tag}/access</tag>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%msg</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- Pipeline Execution Logs -->
|
||||||
|
<appender name="ALLOW-LISTED-PIPELINE-EXECUTIONS" class="ch.qos.logback.core.rolling.RollingFileAppender">
|
||||||
|
<file>allow_listed_pipeline_executions.log</file>
|
||||||
|
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
|
||||||
|
<!-- daily rollover -->
|
||||||
|
<fileNamePattern>allow_listed_pipeline_executions.log.%d.gz</fileNamePattern>
|
||||||
|
<!-- the maximum total size of all the log files -->
|
||||||
|
<totalSizeCap>100MB</totalSizeCap>
|
||||||
|
<!-- keep maximum 7 days' worth of history -->
|
||||||
|
<maxHistory>7</maxHistory>
|
||||||
|
<cleanHistoryOnStart>true</cleanHistoryOnStart>
|
||||||
|
</rollingPolicy>
|
||||||
|
<encoder>
|
||||||
|
<pattern>%date %.-3level ${DEFAULT_SERVICE_PATTERN}%n</pattern>
|
||||||
|
</encoder>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Primary Async Appenders -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<property name="async_queue_size" value="${queue.size:-50000}"/>
|
||||||
|
<property name="async_max_flush_time" value="${max.flush.time:-0}"/>
|
||||||
|
|
||||||
|
<appender name="ASYNC-SERVICE" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="SERVICE"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-ACCESS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="ACCESS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-ALLOW-LISTED-PIPELINE-EXECUTIONS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="ALLOW-LISTED-PIPELINE-EXECUTIONS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-LOGLENS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="LOGLENS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<appender name="ASYNC-LOGLENS-ACCESS" class="com.twitter.inject.logback.AsyncAppender">
|
||||||
|
<queueSize>${async_queue_size}</queueSize>
|
||||||
|
<maxFlushTime>${async_max_flush_time}</maxFlushTime>
|
||||||
|
<appender-ref ref="LOGLENS-ACCESS"/>
|
||||||
|
</appender>
|
||||||
|
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
<!-- Package Config -->
|
||||||
|
<!-- ===================================================== -->
|
||||||
|
|
||||||
|
<!-- Per-Package Config -->
|
||||||
|
<logger name="com.twitter" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.wilyns" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.configbus.client.file" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.finagle.mux" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.finagle.serverset2" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.logging.ScribeHandler" level="INHERITED"/>
|
||||||
|
<logger name="com.twitter.zookeeper.client.internal" level="INHERITED"/>
|
||||||
|
|
||||||
|
<!-- Root Config -->
|
||||||
|
<!-- For all logs except access logs, disable logging below log_level level by default. This can be overriden in the per-package loggers, and dynamically in the admin panel of individual instances. -->
|
||||||
|
<root level="${log_level:-INFO}">
|
||||||
|
<appender-ref ref="ASYNC-SERVICE"/>
|
||||||
|
<appender-ref ref="ASYNC-LOGLENS"/>
|
||||||
|
</root>
|
||||||
|
|
||||||
|
<!-- Access Logging -->
|
||||||
|
<!-- Access logs are turned off by default -->
|
||||||
|
<logger name="com.twitter.finatra.thrift.filters.AccessLoggingFilter" level="OFF" additivity="false">
|
||||||
|
<appender-ref ref="ASYNC-ACCESS"/>
|
||||||
|
<appender-ref ref="ASYNC-LOGLENS-ACCESS"/>
|
||||||
|
</logger>
|
||||||
|
|
||||||
|
</configuration>
|
@ -0,0 +1,13 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finagle-internal/slo/src/main/scala/com/twitter/finagle/slo",
|
||||||
|
"finatra/inject/inject-thrift-client",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/columns",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
"twitter-server-internal/src/main/scala",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,38 @@
|
|||||||
|
package com.twitter.representationscorer
|
||||||
|
|
||||||
|
import com.google.inject.Module
|
||||||
|
import com.twitter.inject.thrift.modules.ThriftClientIdModule
|
||||||
|
import com.twitter.representationscorer.columns.ListScoreColumn
|
||||||
|
import com.twitter.representationscorer.columns.ScoreColumn
|
||||||
|
import com.twitter.representationscorer.columns.SimClustersRecentEngagementSimilarityColumn
|
||||||
|
import com.twitter.representationscorer.columns.SimClustersRecentEngagementSimilarityUserTweetEdgeColumn
|
||||||
|
import com.twitter.representationscorer.modules.CacheModule
|
||||||
|
import com.twitter.representationscorer.modules.EmbeddingStoreModule
|
||||||
|
import com.twitter.representationscorer.modules.RMSConfigModule
|
||||||
|
import com.twitter.representationscorer.modules.TimerModule
|
||||||
|
import com.twitter.representationscorer.twistlyfeatures.UserSignalServiceRecentEngagementsClientModule
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.fed.server._
|
||||||
|
|
||||||
|
object RepresentationScorerFedServerMain extends RepresentationScorerFedServer
|
||||||
|
|
||||||
|
trait RepresentationScorerFedServer extends StratoFedServer {
|
||||||
|
override def dest: String = "/s/representation-scorer/representation-scorer"
|
||||||
|
override val modules: Seq[Module] =
|
||||||
|
Seq(
|
||||||
|
CacheModule,
|
||||||
|
ThriftClientIdModule,
|
||||||
|
UserSignalServiceRecentEngagementsClientModule,
|
||||||
|
TimerModule,
|
||||||
|
RMSConfigModule,
|
||||||
|
EmbeddingStoreModule
|
||||||
|
)
|
||||||
|
|
||||||
|
override def columns: Seq[Class[_ <: StratoFed.Column]] =
|
||||||
|
Seq(
|
||||||
|
classOf[ListScoreColumn],
|
||||||
|
classOf[ScoreColumn],
|
||||||
|
classOf[SimClustersRecentEngagementSimilarityUserTweetEdgeColumn],
|
||||||
|
classOf[SimClustersRecentEngagementSimilarityColumn]
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,16 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"content-recommender/thrift/src/main/thrift:thrift-scala",
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/common",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/modules",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/twistlyfeatures",
|
||||||
|
"representation-scorer/server/src/main/thrift:thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/fed/server",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,13 @@
|
|||||||
|
package com.twitter.representationscorer.columns
|
||||||
|
|
||||||
|
import com.twitter.strato.config.{ContactInfo => StratoContactInfo}
|
||||||
|
|
||||||
|
object Info {
|
||||||
|
val contactInfo: StratoContactInfo = StratoContactInfo(
|
||||||
|
description = "Please contact Relevance Platform team for more details",
|
||||||
|
contactEmail = "no-reply@twitter.com",
|
||||||
|
ldapGroup = "representation-scorer-admins",
|
||||||
|
jiraProject = "JIRA",
|
||||||
|
links = Seq("http://go.twitter.biz/rsx-runbook")
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,116 @@
|
|||||||
|
package com.twitter.representationscorer.columns
|
||||||
|
|
||||||
|
import com.twitter.representationscorer.thriftscala.ListScoreId
|
||||||
|
import com.twitter.representationscorer.thriftscala.ListScoreResponse
|
||||||
|
import com.twitter.representationscorer.scorestore.ScoreStore
|
||||||
|
import com.twitter.representationscorer.thriftscala.ScoreResult
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongInternalId
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongSimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.Score
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreInternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingPairScoreId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import com.twitter.util.Future
|
||||||
|
import com.twitter.util.Return
|
||||||
|
import com.twitter.util.Throw
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class ListScoreColumn @Inject() (scoreStore: ScoreStore)
|
||||||
|
extends StratoFed.Column("recommendations/representation_scorer/listScore")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
override val policy: Policy = Common.rsxReadPolicy
|
||||||
|
|
||||||
|
override type Key = ListScoreId
|
||||||
|
override type View = Unit
|
||||||
|
override type Value = ListScoreResponse
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = ScroogeConv.fromStruct[ListScoreId]
|
||||||
|
override val viewConv: Conv[View] = Conv.ofType
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[ListScoreResponse]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = Info.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText(
|
||||||
|
"Scoring for multiple candidate entities against a single target entity"
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] = {
|
||||||
|
|
||||||
|
val target = SimClustersEmbeddingId(
|
||||||
|
embeddingType = key.targetEmbeddingType,
|
||||||
|
modelVersion = key.modelVersion,
|
||||||
|
internalId = key.targetId
|
||||||
|
)
|
||||||
|
val scoreIds = key.candidateIds.map { candidateId =>
|
||||||
|
val candidate = SimClustersEmbeddingId(
|
||||||
|
embeddingType = key.candidateEmbeddingType,
|
||||||
|
modelVersion = key.modelVersion,
|
||||||
|
internalId = candidateId
|
||||||
|
)
|
||||||
|
ScoreId(
|
||||||
|
algorithm = key.algorithm,
|
||||||
|
internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingPairScoreId(target, candidate)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
Stitch
|
||||||
|
.callFuture {
|
||||||
|
val (keys: Iterable[ScoreId], vals: Iterable[Future[Option[Score]]]) =
|
||||||
|
scoreStore.uniformScoringStore.multiGet(scoreIds.toSet).unzip
|
||||||
|
val results: Future[Iterable[Option[Score]]] = Future.collectToTry(vals.toSeq) map {
|
||||||
|
tryOptVals =>
|
||||||
|
tryOptVals map {
|
||||||
|
case Return(Some(v)) => Some(v)
|
||||||
|
case Return(None) => None
|
||||||
|
case Throw(_) => None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
val scoreMap: Future[Map[Long, Double]] = results.map { scores =>
|
||||||
|
keys
|
||||||
|
.zip(scores).collect {
|
||||||
|
case (
|
||||||
|
ScoreId(
|
||||||
|
_,
|
||||||
|
ScoreInternalId.SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingPairScoreId(
|
||||||
|
_,
|
||||||
|
LongSimClustersEmbeddingId(candidateId)))),
|
||||||
|
Some(score)) =>
|
||||||
|
(candidateId, score.score)
|
||||||
|
}.toMap
|
||||||
|
}
|
||||||
|
scoreMap
|
||||||
|
}
|
||||||
|
.map { (scores: Map[Long, Double]) =>
|
||||||
|
val orderedScores = key.candidateIds.collect {
|
||||||
|
case LongInternalId(id) => ScoreResult(scores.get(id))
|
||||||
|
case _ =>
|
||||||
|
// This will return None scores for candidates which don't have Long ids, but that's fine:
|
||||||
|
// at the moment we're only scoring for Tweets
|
||||||
|
ScoreResult(None)
|
||||||
|
}
|
||||||
|
found(ListScoreResponse(orderedScores))
|
||||||
|
}
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,48 @@
|
|||||||
|
package com.twitter.representationscorer.columns
|
||||||
|
|
||||||
|
import com.twitter.contentrecommender.thriftscala.ScoringResponse
|
||||||
|
import com.twitter.representationscorer.scorestore.ScoreStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreId
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class ScoreColumn @Inject() (scoreStore: ScoreStore)
|
||||||
|
extends StratoFed.Column("recommendations/representation_scorer/score")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
override val policy: Policy = Common.rsxReadPolicy
|
||||||
|
|
||||||
|
override type Key = ScoreId
|
||||||
|
override type View = Unit
|
||||||
|
override type Value = ScoringResponse
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = ScroogeConv.fromStruct[ScoreId]
|
||||||
|
override val viewConv: Conv[View] = Conv.ofType
|
||||||
|
override val valueConv: Conv[Value] = ScroogeConv.fromStruct[ScoringResponse]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = Info.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(PlainText(
|
||||||
|
"The Uniform Scoring Endpoint in Representation Scorer for the Content-Recommender." +
|
||||||
|
" TDD: http://go/representation-scorer-tdd Guideline: http://go/uniform-scoring-guideline"))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] =
|
||||||
|
scoreStore
|
||||||
|
.uniformScoringStoreStitch(key)
|
||||||
|
.map(score => found(ScoringResponse(Some(score))))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,52 @@
|
|||||||
|
package com.twitter.representationscorer.columns
|
||||||
|
|
||||||
|
import com.twitter.representationscorer.common.TweetId
|
||||||
|
import com.twitter.representationscorer.common.UserId
|
||||||
|
import com.twitter.representationscorer.thriftscala.RecentEngagementSimilaritiesResponse
|
||||||
|
import com.twitter.representationscorer.twistlyfeatures.Scorer
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class SimClustersRecentEngagementSimilarityColumn @Inject() (scorer: Scorer)
|
||||||
|
extends StratoFed.Column(
|
||||||
|
"recommendations/representation_scorer/simClustersRecentEngagementSimilarity")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
override val policy: Policy = Common.rsxReadPolicy
|
||||||
|
|
||||||
|
override type Key = (UserId, Seq[TweetId])
|
||||||
|
override type View = Unit
|
||||||
|
override type Value = RecentEngagementSimilaritiesResponse
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = Conv.ofType[(Long, Seq[Long])]
|
||||||
|
override val viewConv: Conv[View] = Conv.ofType
|
||||||
|
override val valueConv: Conv[Value] =
|
||||||
|
ScroogeConv.fromStruct[RecentEngagementSimilaritiesResponse]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = Info.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText(
|
||||||
|
"User-Tweet scores based on the user's recent engagements for multiple tweets."
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] =
|
||||||
|
scorer
|
||||||
|
.get(key._1, key._2)
|
||||||
|
.map(results => found(RecentEngagementSimilaritiesResponse(results)))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,52 @@
|
|||||||
|
package com.twitter.representationscorer.columns
|
||||||
|
|
||||||
|
import com.twitter.representationscorer.common.TweetId
|
||||||
|
import com.twitter.representationscorer.common.UserId
|
||||||
|
import com.twitter.representationscorer.thriftscala.SimClustersRecentEngagementSimilarities
|
||||||
|
import com.twitter.representationscorer.twistlyfeatures.Scorer
|
||||||
|
import com.twitter.stitch
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.strato.catalog.OpMetadata
|
||||||
|
import com.twitter.strato.config.ContactInfo
|
||||||
|
import com.twitter.strato.config.Policy
|
||||||
|
import com.twitter.strato.data.Conv
|
||||||
|
import com.twitter.strato.data.Description.PlainText
|
||||||
|
import com.twitter.strato.data.Lifecycle
|
||||||
|
import com.twitter.strato.fed._
|
||||||
|
import com.twitter.strato.thrift.ScroogeConv
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class SimClustersRecentEngagementSimilarityUserTweetEdgeColumn @Inject() (scorer: Scorer)
|
||||||
|
extends StratoFed.Column(
|
||||||
|
"recommendations/representation_scorer/simClustersRecentEngagementSimilarity.UserTweetEdge")
|
||||||
|
with StratoFed.Fetch.Stitch {
|
||||||
|
|
||||||
|
override val policy: Policy = Common.rsxReadPolicy
|
||||||
|
|
||||||
|
override type Key = (UserId, TweetId)
|
||||||
|
override type View = Unit
|
||||||
|
override type Value = SimClustersRecentEngagementSimilarities
|
||||||
|
|
||||||
|
override val keyConv: Conv[Key] = Conv.ofType[(Long, Long)]
|
||||||
|
override val viewConv: Conv[View] = Conv.ofType
|
||||||
|
override val valueConv: Conv[Value] =
|
||||||
|
ScroogeConv.fromStruct[SimClustersRecentEngagementSimilarities]
|
||||||
|
|
||||||
|
override val contactInfo: ContactInfo = Info.contactInfo
|
||||||
|
|
||||||
|
override val metadata: OpMetadata = OpMetadata(
|
||||||
|
lifecycle = Some(Lifecycle.Production),
|
||||||
|
description = Some(
|
||||||
|
PlainText(
|
||||||
|
"User-Tweet scores based on the user's recent engagements"
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
override def fetch(key: Key, view: View): Stitch[Result[Value]] =
|
||||||
|
scorer
|
||||||
|
.get(key._1, key._2)
|
||||||
|
.map(found(_))
|
||||||
|
.handle {
|
||||||
|
case stitch.NotFound => missing
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,9 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"decider/src/main/scala",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/common",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,7 @@
|
|||||||
|
package com.twitter.representationscorer
|
||||||
|
|
||||||
|
object DeciderConstants {
|
||||||
|
val enableSimClustersEmbeddingStoreTimeouts = "enable_sim_clusters_embedding_store_timeouts"
|
||||||
|
val simClustersEmbeddingStoreTimeoutValueMillis =
|
||||||
|
"sim_clusters_embedding_store_timeout_value_millis"
|
||||||
|
}
|
@ -0,0 +1,27 @@
|
|||||||
|
package com.twitter.representationscorer.common
|
||||||
|
|
||||||
|
import com.twitter.decider.Decider
|
||||||
|
import com.twitter.decider.RandomRecipient
|
||||||
|
import com.twitter.decider.Recipient
|
||||||
|
import com.twitter.simclusters_v2.common.DeciderGateBuilderWithIdHashing
|
||||||
|
import javax.inject.Inject
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
case class RepresentationScorerDecider @Inject() (decider: Decider) {
|
||||||
|
|
||||||
|
val deciderGateBuilder = new DeciderGateBuilderWithIdHashing(decider)
|
||||||
|
|
||||||
|
def isAvailable(feature: String, recipient: Option[Recipient]): Boolean = {
|
||||||
|
decider.isAvailable(feature, recipient)
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* When useRandomRecipient is set to false, the decider is either completely on or off.
|
||||||
|
* When useRandomRecipient is set to true, the decider is on for the specified % of traffic.
|
||||||
|
*/
|
||||||
|
def isAvailable(feature: String, useRandomRecipient: Boolean = true): Boolean = {
|
||||||
|
if (useRandomRecipient) isAvailable(feature, Some(RandomRecipient))
|
||||||
|
else isAvailable(feature, None)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,6 @@
|
|||||||
|
package com.twitter.representationscorer
|
||||||
|
|
||||||
|
package object common {
|
||||||
|
type UserId = Long
|
||||||
|
type TweetId = Long
|
||||||
|
}
|
@ -0,0 +1,19 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"finagle-internal/mtls/src/main/scala/com/twitter/finagle/mtls/authentication",
|
||||||
|
"finagle/finagle-stats",
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-manager/client/src/main/scala/com/twitter/representation_manager",
|
||||||
|
"representation-manager/client/src/main/scala/com/twitter/representation_manager/config",
|
||||||
|
"representation-manager/server/src/main/scala/com/twitter/representation_manager/migration",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/common",
|
||||||
|
"servo/util",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/stores",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/memcache",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/util",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,34 @@
|
|||||||
|
package com.twitter.representationscorer.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import javax.inject.Singleton
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.storehaus_internal.memcache.MemcacheStore
|
||||||
|
import com.twitter.storehaus_internal.util.ClientName
|
||||||
|
import com.twitter.storehaus_internal.util.ZkEndPoint
|
||||||
|
|
||||||
|
object CacheModule extends TwitterModule {
|
||||||
|
|
||||||
|
private val cacheDest = flag[String]("cache_module.dest", "Path to memcache service")
|
||||||
|
private val timeout = flag[Int]("memcache.timeout", "Memcache client timeout")
|
||||||
|
private val retries = flag[Int]("memcache.retries", "Memcache timeout retries")
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesCache(
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
stats: StatsReceiver
|
||||||
|
): Client =
|
||||||
|
MemcacheStore.memcachedClient(
|
||||||
|
name = ClientName("memcache_representation_manager"),
|
||||||
|
dest = ZkEndPoint(cacheDest()),
|
||||||
|
timeout = timeout().milliseconds,
|
||||||
|
retries = retries(),
|
||||||
|
statsReceiver = stats.scope("cache_client"),
|
||||||
|
serviceIdentifier = serviceIdentifier
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,100 @@
|
|||||||
|
package com.twitter.representationscorer.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.decider.Decider
|
||||||
|
import com.twitter.finagle.memcached.{Client => MemcachedClient}
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.finagle.thrift.ClientId
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.relevance_platform.common.readablestore.ReadableStoreWithTimeout
|
||||||
|
import com.twitter.representation_manager.migration.LegacyRMS
|
||||||
|
import com.twitter.representationscorer.DeciderConstants
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.stores.SimClustersEmbeddingStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.util.Timer
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object EmbeddingStoreModule extends TwitterModule {
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesEmbeddingStore(
|
||||||
|
memCachedClient: MemcachedClient,
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
clientId: ClientId,
|
||||||
|
timer: Timer,
|
||||||
|
decider: Decider,
|
||||||
|
stats: StatsReceiver
|
||||||
|
): ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val cacheHashKeyPrefix: String = "RMS"
|
||||||
|
val embeddingStoreClient = new LegacyRMS(
|
||||||
|
serviceIdentifier,
|
||||||
|
memCachedClient,
|
||||||
|
stats,
|
||||||
|
decider,
|
||||||
|
clientId,
|
||||||
|
timer,
|
||||||
|
cacheHashKeyPrefix
|
||||||
|
)
|
||||||
|
|
||||||
|
val underlyingStores: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding]
|
||||||
|
] = Map(
|
||||||
|
// Tweet Embeddings
|
||||||
|
(
|
||||||
|
LogFavBasedTweet,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.logFavBased20M145K2020TweetEmbeddingStore,
|
||||||
|
(
|
||||||
|
LogFavLongestL2EmbeddingTweet,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.logFavBasedLongestL2Tweet20M145K2020EmbeddingStore,
|
||||||
|
// InterestedIn Embeddings
|
||||||
|
(
|
||||||
|
LogFavBasedUserInterestedInFromAPE,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.LogFavBasedInterestedInFromAPE20M145K2020Store,
|
||||||
|
(
|
||||||
|
FavBasedUserInterestedIn,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.favBasedUserInterestedIn20M145K2020Store,
|
||||||
|
// Author Embeddings
|
||||||
|
(
|
||||||
|
FavBasedProducer,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.favBasedProducer20M145K2020EmbeddingStore,
|
||||||
|
// Entity Embeddings
|
||||||
|
(
|
||||||
|
LogFavBasedKgoApeTopic,
|
||||||
|
Model20m145k2020) -> embeddingStoreClient.logFavBasedApeEntity20M145K2020EmbeddingCachedStore,
|
||||||
|
(FavTfgTopic, Model20m145k2020) -> embeddingStoreClient.favBasedTfgTopicEmbedding2020Store,
|
||||||
|
)
|
||||||
|
|
||||||
|
val simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] = {
|
||||||
|
val underlying: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
SimClustersEmbeddingStore.buildWithDecider(
|
||||||
|
underlyingStores = underlyingStores,
|
||||||
|
decider = decider,
|
||||||
|
statsReceiver = stats.scope("simClusters_embeddings_store_deciderable")
|
||||||
|
)
|
||||||
|
|
||||||
|
val underlyingWithTimeout: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding] =
|
||||||
|
new ReadableStoreWithTimeout(
|
||||||
|
rs = underlying,
|
||||||
|
decider = decider,
|
||||||
|
enableTimeoutDeciderKey = DeciderConstants.enableSimClustersEmbeddingStoreTimeouts,
|
||||||
|
timeoutValueKey = DeciderConstants.simClustersEmbeddingStoreTimeoutValueMillis,
|
||||||
|
timer = timer,
|
||||||
|
statsReceiver = stats.scope("simClusters_embedding_store_timeouts")
|
||||||
|
)
|
||||||
|
|
||||||
|
ObservedReadableStore(
|
||||||
|
store = underlyingWithTimeout
|
||||||
|
)(stats.scope("simClusters_embeddings_store"))
|
||||||
|
}
|
||||||
|
simClustersEmbeddingStore
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,63 @@
|
|||||||
|
package com.twitter.representationscorer.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.representation_manager.config.ClientConfig
|
||||||
|
import com.twitter.representation_manager.config.EnabledInMemoryCacheParams
|
||||||
|
import com.twitter.representation_manager.config.InMemoryCacheParams
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType._
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion._
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object RMSConfigModule extends TwitterModule {
|
||||||
|
def getCacheName(embedingType: EmbeddingType, modelVersion: ModelVersion): String =
|
||||||
|
s"${embedingType.name}_${modelVersion.name}_in_mem_cache"
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesRMSClientConfig: ClientConfig = {
|
||||||
|
val cacheParamsMap: Map[
|
||||||
|
(EmbeddingType, ModelVersion),
|
||||||
|
InMemoryCacheParams
|
||||||
|
] = Map(
|
||||||
|
// Tweet Embeddings
|
||||||
|
(LogFavBasedTweet, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 10.minutes,
|
||||||
|
maxKeys = 1048575, // 800MB
|
||||||
|
cacheName = getCacheName(LogFavBasedTweet, Model20m145k2020)),
|
||||||
|
(LogFavLongestL2EmbeddingTweet, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 5.minute,
|
||||||
|
maxKeys = 1048575, // 800MB
|
||||||
|
cacheName = getCacheName(LogFavLongestL2EmbeddingTweet, Model20m145k2020)),
|
||||||
|
// User - KnownFor Embeddings
|
||||||
|
(FavBasedProducer, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 1.day,
|
||||||
|
maxKeys = 500000, // 400MB
|
||||||
|
cacheName = getCacheName(FavBasedProducer, Model20m145k2020)),
|
||||||
|
// User - InterestedIn Embeddings
|
||||||
|
(LogFavBasedUserInterestedInFromAPE, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = getCacheName(LogFavBasedUserInterestedInFromAPE, Model20m145k2020)),
|
||||||
|
(FavBasedUserInterestedIn, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = getCacheName(FavBasedUserInterestedIn, Model20m145k2020)),
|
||||||
|
// Topic Embeddings
|
||||||
|
(FavTfgTopic, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 12.hours,
|
||||||
|
maxKeys = 262143, // 200MB
|
||||||
|
cacheName = getCacheName(FavTfgTopic, Model20m145k2020)),
|
||||||
|
(LogFavBasedKgoApeTopic, Model20m145k2020) -> EnabledInMemoryCacheParams(
|
||||||
|
ttl = 6.hours,
|
||||||
|
maxKeys = 262143,
|
||||||
|
cacheName = getCacheName(LogFavBasedKgoApeTopic, Model20m145k2020)),
|
||||||
|
)
|
||||||
|
|
||||||
|
new ClientConfig(inMemCacheParamsOverrides = cacheParamsMap)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,13 @@
|
|||||||
|
package com.twitter.representationscorer.modules
|
||||||
|
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.finagle.util.DefaultTimer
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.util.Timer
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object TimerModule extends TwitterModule {
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def providesTimer: Timer = DefaultTimer
|
||||||
|
}
|
@ -0,0 +1,19 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"frigate/frigate-common/src/main/scala/com/twitter/frigate/common/util",
|
||||||
|
"hermit/hermit-core/src/main/scala/com/twitter/hermit/store/common",
|
||||||
|
"relevance-platform/src/main/scala/com/twitter/relevance_platform/common/injection",
|
||||||
|
"representation-manager/client/src/main/scala/com/twitter/representation_manager",
|
||||||
|
"representation-manager/client/src/main/scala/com/twitter/representation_manager/config",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/common",
|
||||||
|
"src/scala/com/twitter/simclusters_v2/score",
|
||||||
|
"src/scala/com/twitter/topic_recos/common",
|
||||||
|
"src/scala/com/twitter/topic_recos/stores",
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift-scala",
|
||||||
|
"src/thrift/com/twitter/topic_recos:topic_recos-thrift-scala",
|
||||||
|
"stitch/stitch-storehaus",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,168 @@
|
|||||||
|
package com.twitter.representationscorer.scorestore
|
||||||
|
|
||||||
|
import com.twitter.bijection.scrooge.BinaryScalaCodec
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.finagle.memcached.Client
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.hashing.KeyHasher
|
||||||
|
import com.twitter.hermit.store.common.ObservedCachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedMemcachedReadableStore
|
||||||
|
import com.twitter.hermit.store.common.ObservedReadableStore
|
||||||
|
import com.twitter.relevance_platform.common.injection.LZ4Injection
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbedding
|
||||||
|
import com.twitter.simclusters_v2.score.ScoreFacadeStore
|
||||||
|
import com.twitter.simclusters_v2.score.SimClustersEmbeddingPairScoreStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType.FavTfgTopic
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType.LogFavBasedKgoApeTopic
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType.LogFavBasedTweet
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion.Model20m145kUpdated
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.Score
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.stitch.storehaus.StitchOfReadableStore
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.strato.client.{Client => StratoClient}
|
||||||
|
import com.twitter.topic_recos.stores.CertoTweetTopicScoresStore
|
||||||
|
import javax.inject.Inject
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
@Singleton()
|
||||||
|
class ScoreStore @Inject() (
|
||||||
|
simClustersEmbeddingStore: ReadableStore[SimClustersEmbeddingId, SimClustersEmbedding],
|
||||||
|
stratoClient: StratoClient,
|
||||||
|
representationScorerCacheClient: Client,
|
||||||
|
stats: StatsReceiver) {
|
||||||
|
|
||||||
|
private val keyHasher = KeyHasher.FNV1A_64
|
||||||
|
private val statsReceiver = stats.scope("score_store")
|
||||||
|
|
||||||
|
/** ** Score Store *****/
|
||||||
|
private val simClustersEmbeddingCosineSimilarityScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildCosineSimilarityStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_cosine_similarity_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingDotProductScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildDotProductStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_dot_product_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingJaccardSimilarityScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildJaccardSimilarityStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_jaccard_similarity_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingEuclideanDistanceScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildEuclideanDistanceStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_euclidean_distance_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingManhattanDistanceScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildManhattanDistanceStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_manhattan_distance_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingLogCosineSimilarityScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildLogCosineSimilarityStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_log_cosine_similarity_score_store"))
|
||||||
|
|
||||||
|
private val simClustersEmbeddingExpScaledCosineSimilarityScoreStore =
|
||||||
|
ObservedReadableStore(
|
||||||
|
SimClustersEmbeddingPairScoreStore
|
||||||
|
.buildExpScaledCosineSimilarityStore(simClustersEmbeddingStore)
|
||||||
|
.toThriftStore
|
||||||
|
)(statsReceiver.scope("simClusters_embedding_exp_scaled_cosine_similarity_score_store"))
|
||||||
|
|
||||||
|
// Use the default setting
|
||||||
|
private val topicTweetRankingScoreStore =
|
||||||
|
TopicTweetRankingScoreStore.buildTopicTweetRankingStore(
|
||||||
|
FavTfgTopic,
|
||||||
|
LogFavBasedKgoApeTopic,
|
||||||
|
LogFavBasedTweet,
|
||||||
|
Model20m145kUpdated,
|
||||||
|
consumerEmbeddingMultiplier = 1.0,
|
||||||
|
producerEmbeddingMultiplier = 1.0
|
||||||
|
)
|
||||||
|
|
||||||
|
private val topicTweetsCortexThresholdStore = TopicTweetsCosineSimilarityAggregateStore(
|
||||||
|
TopicTweetsCosineSimilarityAggregateStore.DefaultScoreKeys,
|
||||||
|
statsReceiver.scope("topic_tweets_cortex_threshold_store")
|
||||||
|
)
|
||||||
|
|
||||||
|
val topicTweetCertoScoreStore: ObservedCachedReadableStore[ScoreId, Score] = {
|
||||||
|
val underlyingStore = ObservedReadableStore(
|
||||||
|
TopicTweetCertoScoreStore(CertoTweetTopicScoresStore.prodStore(stratoClient))
|
||||||
|
)(statsReceiver.scope("topic_tweet_certo_score_store"))
|
||||||
|
|
||||||
|
val memcachedStore = ObservedMemcachedReadableStore
|
||||||
|
.fromCacheClient(
|
||||||
|
backingStore = underlyingStore,
|
||||||
|
cacheClient = representationScorerCacheClient,
|
||||||
|
ttl = 10.minutes
|
||||||
|
)(
|
||||||
|
valueInjection = LZ4Injection.compose(BinaryScalaCodec(Score)),
|
||||||
|
statsReceiver = statsReceiver.scope("topic_tweet_certo_store_memcache"),
|
||||||
|
keyToString = { k: ScoreId =>
|
||||||
|
s"certocs:${keyHasher.hashKey(k.toString.getBytes)}"
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
ObservedCachedReadableStore.from[ScoreId, Score](
|
||||||
|
memcachedStore,
|
||||||
|
ttl = 5.minutes,
|
||||||
|
maxKeys = 1000000,
|
||||||
|
cacheName = "topic_tweet_certo_store_cache",
|
||||||
|
windowSize = 10000L
|
||||||
|
)(statsReceiver.scope("topic_tweet_certo_store_cache"))
|
||||||
|
}
|
||||||
|
|
||||||
|
val uniformScoringStore: ReadableStore[ScoreId, Score] =
|
||||||
|
ScoreFacadeStore.buildWithMetrics(
|
||||||
|
readableStores = Map(
|
||||||
|
ScoringAlgorithm.PairEmbeddingCosineSimilarity ->
|
||||||
|
simClustersEmbeddingCosineSimilarityScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingDotProduct ->
|
||||||
|
simClustersEmbeddingDotProductScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingJaccardSimilarity ->
|
||||||
|
simClustersEmbeddingJaccardSimilarityScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingEuclideanDistance ->
|
||||||
|
simClustersEmbeddingEuclideanDistanceScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingManhattanDistance ->
|
||||||
|
simClustersEmbeddingManhattanDistanceScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingLogCosineSimilarity ->
|
||||||
|
simClustersEmbeddingLogCosineSimilarityScoreStore,
|
||||||
|
ScoringAlgorithm.PairEmbeddingExpScaledCosineSimilarity ->
|
||||||
|
simClustersEmbeddingExpScaledCosineSimilarityScoreStore,
|
||||||
|
// Certo normalized cosine score between topic-tweet pairs
|
||||||
|
ScoringAlgorithm.CertoNormalizedCosineScore
|
||||||
|
-> topicTweetCertoScoreStore,
|
||||||
|
// Certo normalized dot-product score between topic-tweet pairs
|
||||||
|
ScoringAlgorithm.CertoNormalizedDotProductScore
|
||||||
|
-> topicTweetCertoScoreStore
|
||||||
|
),
|
||||||
|
aggregatedStores = Map(
|
||||||
|
ScoringAlgorithm.WeightedSumTopicTweetRanking ->
|
||||||
|
topicTweetRankingScoreStore,
|
||||||
|
ScoringAlgorithm.CortexTopicTweetLabel ->
|
||||||
|
topicTweetsCortexThresholdStore,
|
||||||
|
),
|
||||||
|
statsReceiver = stats
|
||||||
|
)
|
||||||
|
|
||||||
|
val uniformScoringStoreStitch: ScoreId => com.twitter.stitch.Stitch[Score] =
|
||||||
|
StitchOfReadableStore(uniformScoringStore)
|
||||||
|
}
|
@ -0,0 +1,106 @@
|
|||||||
|
package com.twitter.representationscorer.scorestore
|
||||||
|
|
||||||
|
import com.twitter.simclusters_v2.common.TweetId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreInternalId.GenericPairScoreId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CertoNormalizedDotProductScore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CertoNormalizedCosineScore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.TopicId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{Score => ThriftScore}
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{ScoreId => ThriftScoreId}
|
||||||
|
import com.twitter.storehaus.FutureOps
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.topic_recos.thriftscala.Scores
|
||||||
|
import com.twitter.topic_recos.thriftscala.TopicToScores
|
||||||
|
import com.twitter.util.Future
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Score store to get Certo <topic, tweet> scores.
|
||||||
|
* Currently, the store supports two Scoring Algorithms (i.e., two types of Certo scores):
|
||||||
|
* 1. NormalizedDotProduct
|
||||||
|
* 2. NormalizedCosine
|
||||||
|
* Querying with corresponding scoring algorithms results in different Certo scores.
|
||||||
|
*/
|
||||||
|
case class TopicTweetCertoScoreStore(certoStratoStore: ReadableStore[TweetId, TopicToScores])
|
||||||
|
extends ReadableStore[ThriftScoreId, ThriftScore] {
|
||||||
|
|
||||||
|
override def multiGet[K1 <: ThriftScoreId](ks: Set[K1]): Map[K1, Future[Option[ThriftScore]]] = {
|
||||||
|
val tweetIds =
|
||||||
|
ks.map(_.internalId).collect {
|
||||||
|
case GenericPairScoreId(scoreId) =>
|
||||||
|
((scoreId.id1, scoreId.id2): @annotation.nowarn(
|
||||||
|
"msg=may not be exhaustive|max recursion depth")) match {
|
||||||
|
case (InternalId.TweetId(tweetId), _) => tweetId
|
||||||
|
case (_, InternalId.TweetId(tweetId)) => tweetId
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
val result = for {
|
||||||
|
certoScores <- Future.collect(certoStratoStore.multiGet(tweetIds))
|
||||||
|
} yield {
|
||||||
|
ks.map { k =>
|
||||||
|
(k.algorithm, k.internalId) match {
|
||||||
|
case (CertoNormalizedDotProductScore, GenericPairScoreId(scoreId)) =>
|
||||||
|
(scoreId.id1, scoreId.id2) match {
|
||||||
|
case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) =>
|
||||||
|
(
|
||||||
|
k,
|
||||||
|
extractScore(
|
||||||
|
tweetId,
|
||||||
|
topicId,
|
||||||
|
certoScores,
|
||||||
|
_.followerL2NormalizedDotProduct8HrHalfLife))
|
||||||
|
case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) =>
|
||||||
|
(
|
||||||
|
k,
|
||||||
|
extractScore(
|
||||||
|
tweetId,
|
||||||
|
topicId,
|
||||||
|
certoScores,
|
||||||
|
_.followerL2NormalizedDotProduct8HrHalfLife))
|
||||||
|
case _ => (k, None)
|
||||||
|
}
|
||||||
|
case (CertoNormalizedCosineScore, GenericPairScoreId(scoreId)) =>
|
||||||
|
(scoreId.id1, scoreId.id2) match {
|
||||||
|
case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) =>
|
||||||
|
(
|
||||||
|
k,
|
||||||
|
extractScore(
|
||||||
|
tweetId,
|
||||||
|
topicId,
|
||||||
|
certoScores,
|
||||||
|
_.followerL2NormalizedCosineSimilarity8HrHalfLife))
|
||||||
|
case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) =>
|
||||||
|
(
|
||||||
|
k,
|
||||||
|
extractScore(
|
||||||
|
tweetId,
|
||||||
|
topicId,
|
||||||
|
certoScores,
|
||||||
|
_.followerL2NormalizedCosineSimilarity8HrHalfLife))
|
||||||
|
case _ => (k, None)
|
||||||
|
}
|
||||||
|
case _ => (k, None)
|
||||||
|
}
|
||||||
|
}.toMap
|
||||||
|
}
|
||||||
|
FutureOps.liftValues(ks, result)
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Given tweetToCertoScores, extract certain Certo score between the given tweetId and topicId.
|
||||||
|
* The Certo score of interest is specified using scoreExtractor.
|
||||||
|
*/
|
||||||
|
def extractScore(
|
||||||
|
tweetId: TweetId,
|
||||||
|
topicId: TopicId,
|
||||||
|
tweetToCertoScores: Map[TweetId, Option[TopicToScores]],
|
||||||
|
scoreExtractor: Scores => Double
|
||||||
|
): Option[ThriftScore] = {
|
||||||
|
tweetToCertoScores.get(tweetId).flatMap {
|
||||||
|
case Some(topicToScores) =>
|
||||||
|
topicToScores.topicToScores.flatMap(_.get(topicId).map(scoreExtractor).map(ThriftScore(_)))
|
||||||
|
case _ => Some(ThriftScore(0.0))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,48 @@
|
|||||||
|
package com.twitter.representationscorer.scorestore
|
||||||
|
|
||||||
|
import com.twitter.simclusters_v2.score.WeightedSumAggregatedScoreStore
|
||||||
|
import com.twitter.simclusters_v2.score.WeightedSumAggregatedScoreStore.WeightedSumAggregatedScoreParameter
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{EmbeddingType, ModelVersion, ScoringAlgorithm}
|
||||||
|
|
||||||
|
object TopicTweetRankingScoreStore {
|
||||||
|
val producerEmbeddingScoreMultiplier = 1.0
|
||||||
|
val consumerEmbeddingScoreMultiplier = 1.0
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build the scoring store for TopicTweet Ranking based on Default Multipliers.
|
||||||
|
* If you want to compare the ranking between different multipliers, register a new
|
||||||
|
* ScoringAlgorithm and let the upstream uses different scoringAlgorithm by params.
|
||||||
|
*/
|
||||||
|
def buildTopicTweetRankingStore(
|
||||||
|
consumerEmbeddingType: EmbeddingType,
|
||||||
|
producerEmbeddingType: EmbeddingType,
|
||||||
|
tweetEmbeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion,
|
||||||
|
consumerEmbeddingMultiplier: Double = consumerEmbeddingScoreMultiplier,
|
||||||
|
producerEmbeddingMultiplier: Double = producerEmbeddingScoreMultiplier
|
||||||
|
): WeightedSumAggregatedScoreStore = {
|
||||||
|
WeightedSumAggregatedScoreStore(
|
||||||
|
List(
|
||||||
|
WeightedSumAggregatedScoreParameter(
|
||||||
|
ScoringAlgorithm.PairEmbeddingCosineSimilarity,
|
||||||
|
consumerEmbeddingMultiplier,
|
||||||
|
WeightedSumAggregatedScoreStore.genericPairScoreIdToSimClustersEmbeddingPairScoreId(
|
||||||
|
consumerEmbeddingType,
|
||||||
|
tweetEmbeddingType,
|
||||||
|
modelVersion
|
||||||
|
)
|
||||||
|
),
|
||||||
|
WeightedSumAggregatedScoreParameter(
|
||||||
|
ScoringAlgorithm.PairEmbeddingCosineSimilarity,
|
||||||
|
producerEmbeddingMultiplier,
|
||||||
|
WeightedSumAggregatedScoreStore.genericPairScoreIdToSimClustersEmbeddingPairScoreId(
|
||||||
|
producerEmbeddingType,
|
||||||
|
tweetEmbeddingType,
|
||||||
|
modelVersion
|
||||||
|
)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,148 @@
|
|||||||
|
package com.twitter.representationscorer.scorestore
|
||||||
|
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.frigate.common.util.StatsUtil
|
||||||
|
import com.twitter.representationscorer.scorestore.TopicTweetsCosineSimilarityAggregateStore.ScoreKey
|
||||||
|
import com.twitter.simclusters_v2.common.TweetId
|
||||||
|
import com.twitter.simclusters_v2.score.AggregatedScoreStore
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreInternalId.GenericPairScoreId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm.CortexTopicTweetLabel
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.{
|
||||||
|
EmbeddingType,
|
||||||
|
InternalId,
|
||||||
|
ModelVersion,
|
||||||
|
ScoreInternalId,
|
||||||
|
ScoringAlgorithm,
|
||||||
|
SimClustersEmbeddingId,
|
||||||
|
TopicId,
|
||||||
|
Score => ThriftScore,
|
||||||
|
ScoreId => ThriftScoreId,
|
||||||
|
SimClustersEmbeddingPairScoreId => ThriftSimClustersEmbeddingPairScoreId
|
||||||
|
}
|
||||||
|
import com.twitter.storehaus.ReadableStore
|
||||||
|
import com.twitter.topic_recos.common.Configs.{DefaultModelVersion, MinCosineSimilarityScore}
|
||||||
|
import com.twitter.topic_recos.common._
|
||||||
|
import com.twitter.util.Future
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Calculates the cosine similarity scores of arbitrary combinations of TopicEmbeddings and
|
||||||
|
* TweetEmbeddings.
|
||||||
|
* The class has 2 uses:
|
||||||
|
* 1. For internal uses. TSP will call this store to fetch the raw scores for (topic, tweet) with
|
||||||
|
* all available embedding types. We calculate all the scores here, so the caller can do filtering
|
||||||
|
* & score caching on their side. This will make it possible to DDG different embedding scores.
|
||||||
|
*
|
||||||
|
* 2. For external calls from Cortex. We return true (or 1.0) for any given (topic, tweet) if their
|
||||||
|
* cosine similarity passes the threshold for any of the embedding types.
|
||||||
|
* The expected input type is
|
||||||
|
* ScoreId(
|
||||||
|
* PairEmbeddingCosineSimilarity,
|
||||||
|
* GenericPairScoreId(TopicId, TweetId)
|
||||||
|
* )
|
||||||
|
*/
|
||||||
|
case class TopicTweetsCosineSimilarityAggregateStore(
|
||||||
|
scoreKeys: Seq[ScoreKey],
|
||||||
|
statsReceiver: StatsReceiver)
|
||||||
|
extends AggregatedScoreStore {
|
||||||
|
|
||||||
|
def toCortexScore(scoresMap: Map[ScoreKey, Double]): Double = {
|
||||||
|
val passThreshold = scoresMap.exists {
|
||||||
|
case (_, score) => score >= MinCosineSimilarityScore
|
||||||
|
}
|
||||||
|
if (passThreshold) 1.0 else 0.0
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* To be called by Cortex through Unified Score API ONLY. Calculates all possible (topic, tweet),
|
||||||
|
* return 1.0 if any of the embedding scores passes the minimum threshold.
|
||||||
|
*
|
||||||
|
* Expect a GenericPairScoreId(PairEmbeddingCosineSimilarity, (TopicId, TweetId)) as input
|
||||||
|
*/
|
||||||
|
override def get(k: ThriftScoreId): Future[Option[ThriftScore]] = {
|
||||||
|
StatsUtil.trackOptionStats(statsReceiver) {
|
||||||
|
(k.algorithm, k.internalId) match {
|
||||||
|
case (CortexTopicTweetLabel, GenericPairScoreId(genericPairScoreId)) =>
|
||||||
|
(genericPairScoreId.id1, genericPairScoreId.id2) match {
|
||||||
|
case (InternalId.TopicId(topicId), InternalId.TweetId(tweetId)) =>
|
||||||
|
TopicTweetsCosineSimilarityAggregateStore
|
||||||
|
.getRawScoresMap(topicId, tweetId, scoreKeys, scoreFacadeStore)
|
||||||
|
.map { scoresMap => Some(ThriftScore(toCortexScore(scoresMap))) }
|
||||||
|
case (InternalId.TweetId(tweetId), InternalId.TopicId(topicId)) =>
|
||||||
|
TopicTweetsCosineSimilarityAggregateStore
|
||||||
|
.getRawScoresMap(topicId, tweetId, scoreKeys, scoreFacadeStore)
|
||||||
|
.map { scoresMap => Some(ThriftScore(toCortexScore(scoresMap))) }
|
||||||
|
case _ =>
|
||||||
|
Future.None
|
||||||
|
// Do not accept other InternalId combinations
|
||||||
|
}
|
||||||
|
case _ =>
|
||||||
|
// Do not accept other Id types for now
|
||||||
|
Future.None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
object TopicTweetsCosineSimilarityAggregateStore {
|
||||||
|
|
||||||
|
val TopicEmbeddingTypes: Seq[EmbeddingType] =
|
||||||
|
Seq(
|
||||||
|
EmbeddingType.FavTfgTopic,
|
||||||
|
EmbeddingType.LogFavBasedKgoApeTopic
|
||||||
|
)
|
||||||
|
|
||||||
|
// Add the new embedding types if want to test the new Tweet embedding performance.
|
||||||
|
val TweetEmbeddingTypes: Seq[EmbeddingType] = Seq(EmbeddingType.LogFavBasedTweet)
|
||||||
|
|
||||||
|
val ModelVersions: Seq[ModelVersion] =
|
||||||
|
Seq(DefaultModelVersion)
|
||||||
|
|
||||||
|
val DefaultScoreKeys: Seq[ScoreKey] = {
|
||||||
|
for {
|
||||||
|
modelVersion <- ModelVersions
|
||||||
|
topicEmbeddingType <- TopicEmbeddingTypes
|
||||||
|
tweetEmbeddingType <- TweetEmbeddingTypes
|
||||||
|
} yield {
|
||||||
|
ScoreKey(
|
||||||
|
topicEmbeddingType = topicEmbeddingType,
|
||||||
|
tweetEmbeddingType = tweetEmbeddingType,
|
||||||
|
modelVersion = modelVersion
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
case class ScoreKey(
|
||||||
|
topicEmbeddingType: EmbeddingType,
|
||||||
|
tweetEmbeddingType: EmbeddingType,
|
||||||
|
modelVersion: ModelVersion)
|
||||||
|
|
||||||
|
def getRawScoresMap(
|
||||||
|
topicId: TopicId,
|
||||||
|
tweetId: TweetId,
|
||||||
|
scoreKeys: Seq[ScoreKey],
|
||||||
|
uniformScoringStore: ReadableStore[ThriftScoreId, ThriftScore]
|
||||||
|
): Future[Map[ScoreKey, Double]] = {
|
||||||
|
val scoresMapFut = scoreKeys.map { key =>
|
||||||
|
val scoreInternalId = ScoreInternalId.SimClustersEmbeddingPairScoreId(
|
||||||
|
ThriftSimClustersEmbeddingPairScoreId(
|
||||||
|
buildTopicEmbedding(topicId, key.topicEmbeddingType, key.modelVersion),
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
key.tweetEmbeddingType,
|
||||||
|
key.modelVersion,
|
||||||
|
InternalId.TweetId(tweetId))
|
||||||
|
))
|
||||||
|
val scoreFut = uniformScoringStore
|
||||||
|
.get(
|
||||||
|
ThriftScoreId(
|
||||||
|
algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity, // Hard code as cosine sim
|
||||||
|
internalId = scoreInternalId
|
||||||
|
))
|
||||||
|
key -> scoreFut
|
||||||
|
}.toMap
|
||||||
|
|
||||||
|
Future
|
||||||
|
.collect(scoresMapFut).map(_.collect {
|
||||||
|
case (key, Some(ThriftScore(score))) =>
|
||||||
|
(key, score)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,20 @@
|
|||||||
|
scala_library(
|
||||||
|
compiler_option_sets = ["fatal_warnings"],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"3rdparty/jvm/com/github/ben-manes/caffeine",
|
||||||
|
"finatra/inject/inject-core/src/main/scala",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/common",
|
||||||
|
"representation-scorer/server/src/main/scala/com/twitter/representationscorer/scorestore",
|
||||||
|
"representation-scorer/server/src/main/thrift:thrift-scala",
|
||||||
|
"src/thrift/com/twitter/twistly:twistly-scala",
|
||||||
|
"stitch/stitch-core",
|
||||||
|
"stitch/stitch-core:cache",
|
||||||
|
"strato/config/columns/recommendations/twistly:twistly-strato-client",
|
||||||
|
"strato/config/columns/recommendations/user-signal-service:user-signal-service-strato-client",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/client",
|
||||||
|
"user-signal-service/thrift/src/main/thrift:thrift-scala",
|
||||||
|
"util/util-core",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,65 @@
|
|||||||
|
package com.twitter.representationscorer.twistlyfeatures
|
||||||
|
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
import com.twitter.util.Time
|
||||||
|
|
||||||
|
case class Engagements(
|
||||||
|
favs7d: Seq[UserSignal] = Nil,
|
||||||
|
retweets7d: Seq[UserSignal] = Nil,
|
||||||
|
follows30d: Seq[UserSignal] = Nil,
|
||||||
|
shares7d: Seq[UserSignal] = Nil,
|
||||||
|
replies7d: Seq[UserSignal] = Nil,
|
||||||
|
originalTweets7d: Seq[UserSignal] = Nil,
|
||||||
|
videoPlaybacks7d: Seq[UserSignal] = Nil,
|
||||||
|
block30d: Seq[UserSignal] = Nil,
|
||||||
|
mute30d: Seq[UserSignal] = Nil,
|
||||||
|
report30d: Seq[UserSignal] = Nil,
|
||||||
|
dontlike30d: Seq[UserSignal] = Nil,
|
||||||
|
seeFewer30d: Seq[UserSignal] = Nil) {
|
||||||
|
|
||||||
|
import Engagements._
|
||||||
|
|
||||||
|
private val now = Time.now
|
||||||
|
private val oneDayAgo = (now - OneDaySpan).inMillis
|
||||||
|
private val sevenDaysAgo = (now - SevenDaysSpan).inMillis
|
||||||
|
|
||||||
|
// All ids from the signals grouped by type (tweetIds, userIds, etc)
|
||||||
|
val tweetIds: Seq[Long] =
|
||||||
|
(favs7d ++ retweets7d ++ shares7d
|
||||||
|
++ replies7d ++ originalTweets7d ++ videoPlaybacks7d
|
||||||
|
++ report30d ++ dontlike30d ++ seeFewer30d)
|
||||||
|
.map(_.targetId)
|
||||||
|
val authorIds: Seq[Long] = (follows30d ++ block30d ++ mute30d).map(_.targetId)
|
||||||
|
|
||||||
|
// Tweet signals
|
||||||
|
val dontlike7d: Seq[UserSignal] = dontlike30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
val seeFewer7d: Seq[UserSignal] = seeFewer30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
|
||||||
|
val favs1d: Seq[UserSignal] = favs7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val retweets1d: Seq[UserSignal] = retweets7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val shares1d: Seq[UserSignal] = shares7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val replies1d: Seq[UserSignal] = replies7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val originalTweets1d: Seq[UserSignal] = originalTweets7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val videoPlaybacks1d: Seq[UserSignal] = videoPlaybacks7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val dontlike1d: Seq[UserSignal] = dontlike7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val seeFewer1d: Seq[UserSignal] = seeFewer7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
|
||||||
|
// User signals
|
||||||
|
val follows7d: Seq[UserSignal] = follows30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
val block7d: Seq[UserSignal] = block30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
val mute7d: Seq[UserSignal] = mute30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
val report7d: Seq[UserSignal] = report30d.filter(_.timestamp > sevenDaysAgo)
|
||||||
|
|
||||||
|
val block1d: Seq[UserSignal] = block7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val mute1d: Seq[UserSignal] = mute7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
val report1d: Seq[UserSignal] = report7d.filter(_.timestamp > oneDayAgo)
|
||||||
|
}
|
||||||
|
|
||||||
|
object Engagements {
|
||||||
|
val OneDaySpan: Duration = 1.days
|
||||||
|
val SevenDaysSpan: Duration = 7.days
|
||||||
|
val ThirtyDaysSpan: Duration = 30.days
|
||||||
|
}
|
||||||
|
|
||||||
|
case class UserSignal(targetId: Long, timestamp: Long)
|
@ -0,0 +1,3 @@
|
|||||||
|
package com.twitter.representationscorer.twistlyfeatures
|
||||||
|
|
||||||
|
case class ScoreResult(id: Long, score: Option[Double])
|
@ -0,0 +1,474 @@
|
|||||||
|
package com.twitter.representationscorer.twistlyfeatures
|
||||||
|
|
||||||
|
import com.twitter.finagle.stats.Counter
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.representationscorer.common.TweetId
|
||||||
|
import com.twitter.representationscorer.common.UserId
|
||||||
|
import com.twitter.representationscorer.scorestore.ScoreStore
|
||||||
|
import com.twitter.representationscorer.thriftscala.SimClustersRecentEngagementSimilarities
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.EmbeddingType
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.InternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ModelVersion
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoreInternalId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.ScoringAlgorithm
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingId
|
||||||
|
import com.twitter.simclusters_v2.thriftscala.SimClustersEmbeddingPairScoreId
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import javax.inject.Inject
|
||||||
|
|
||||||
|
class Scorer @Inject() (
|
||||||
|
fetchEngagementsFromUSS: Long => Stitch[Engagements],
|
||||||
|
scoreStore: ScoreStore,
|
||||||
|
stats: StatsReceiver) {
|
||||||
|
|
||||||
|
import Scorer._
|
||||||
|
|
||||||
|
private val scoreStats = stats.scope("score")
|
||||||
|
private val scoreCalculationStats = scoreStats.scope("calculation")
|
||||||
|
private val scoreResultStats = scoreStats.scope("result")
|
||||||
|
|
||||||
|
private val scoresNonEmptyCounter = scoreResultStats.scope("all").counter("nonEmpty")
|
||||||
|
private val scoresNonZeroCounter = scoreResultStats.scope("all").counter("nonZero")
|
||||||
|
|
||||||
|
private val tweetScoreStats = scoreCalculationStats.scope("tweetScore").stat("latency")
|
||||||
|
private val userScoreStats = scoreCalculationStats.scope("userScore").stat("latency")
|
||||||
|
|
||||||
|
private val favNonZero = scoreResultStats.scope("favs").counter("nonZero")
|
||||||
|
private val favNonEmpty = scoreResultStats.scope("favs").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val retweetsNonZero = scoreResultStats.scope("retweets").counter("nonZero")
|
||||||
|
private val retweetsNonEmpty = scoreResultStats.scope("retweets").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val followsNonZero = scoreResultStats.scope("follows").counter("nonZero")
|
||||||
|
private val followsNonEmpty = scoreResultStats.scope("follows").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val sharesNonZero = scoreResultStats.scope("shares").counter("nonZero")
|
||||||
|
private val sharesNonEmpty = scoreResultStats.scope("shares").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val repliesNonZero = scoreResultStats.scope("replies").counter("nonZero")
|
||||||
|
private val repliesNonEmpty = scoreResultStats.scope("replies").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val originalTweetsNonZero = scoreResultStats.scope("originalTweets").counter("nonZero")
|
||||||
|
private val originalTweetsNonEmpty = scoreResultStats.scope("originalTweets").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val videoViewsNonZero = scoreResultStats.scope("videoViews").counter("nonZero")
|
||||||
|
private val videoViewsNonEmpty = scoreResultStats.scope("videoViews").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val blockNonZero = scoreResultStats.scope("block").counter("nonZero")
|
||||||
|
private val blockNonEmpty = scoreResultStats.scope("block").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val muteNonZero = scoreResultStats.scope("mute").counter("nonZero")
|
||||||
|
private val muteNonEmpty = scoreResultStats.scope("mute").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val reportNonZero = scoreResultStats.scope("report").counter("nonZero")
|
||||||
|
private val reportNonEmpty = scoreResultStats.scope("report").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val dontlikeNonZero = scoreResultStats.scope("dontlike").counter("nonZero")
|
||||||
|
private val dontlikeNonEmpty = scoreResultStats.scope("dontlike").counter("nonEmpty")
|
||||||
|
|
||||||
|
private val seeFewerNonZero = scoreResultStats.scope("seeFewer").counter("nonZero")
|
||||||
|
private val seeFewerNonEmpty = scoreResultStats.scope("seeFewer").counter("nonEmpty")
|
||||||
|
|
||||||
|
private def getTweetScores(
|
||||||
|
candidateTweetId: TweetId,
|
||||||
|
sourceTweetIds: Seq[TweetId]
|
||||||
|
): Stitch[Seq[ScoreResult]] = {
|
||||||
|
val getScoresStitch = Stitch.traverse(sourceTweetIds) { sourceTweetId =>
|
||||||
|
scoreStore
|
||||||
|
.uniformScoringStoreStitch(getTweetScoreId(sourceTweetId, candidateTweetId))
|
||||||
|
.liftNotFoundToOption
|
||||||
|
.map(score => ScoreResult(sourceTweetId, score.map(_.score)))
|
||||||
|
}
|
||||||
|
|
||||||
|
Stitch.time(getScoresStitch).flatMap {
|
||||||
|
case (tryResult, duration) =>
|
||||||
|
tweetScoreStats.add(duration.inMillis)
|
||||||
|
Stitch.const(tryResult)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private def getUserScores(
|
||||||
|
tweetId: TweetId,
|
||||||
|
authorIds: Seq[UserId]
|
||||||
|
): Stitch[Seq[ScoreResult]] = {
|
||||||
|
val getScoresStitch = Stitch.traverse(authorIds) { authorId =>
|
||||||
|
scoreStore
|
||||||
|
.uniformScoringStoreStitch(getAuthorScoreId(authorId, tweetId))
|
||||||
|
.liftNotFoundToOption
|
||||||
|
.map(score => ScoreResult(authorId, score.map(_.score)))
|
||||||
|
}
|
||||||
|
|
||||||
|
Stitch.time(getScoresStitch).flatMap {
|
||||||
|
case (tryResult, duration) =>
|
||||||
|
userScoreStats.add(duration.inMillis)
|
||||||
|
Stitch.const(tryResult)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the [[SimClustersRecentEngagementSimilarities]] result containing the similarity
|
||||||
|
* features for the given userId-TweetId.
|
||||||
|
*/
|
||||||
|
def get(
|
||||||
|
userId: UserId,
|
||||||
|
tweetId: TweetId
|
||||||
|
): Stitch[SimClustersRecentEngagementSimilarities] = {
|
||||||
|
get(userId, Seq(tweetId)).map(x => x.head)
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get a list of [[SimClustersRecentEngagementSimilarities]] results containing the similarity
|
||||||
|
* features for the given tweets of the user Id.
|
||||||
|
* Guaranteed to be the same number/order as requested.
|
||||||
|
*/
|
||||||
|
def get(
|
||||||
|
userId: UserId,
|
||||||
|
tweetIds: Seq[TweetId]
|
||||||
|
): Stitch[Seq[SimClustersRecentEngagementSimilarities]] = {
|
||||||
|
fetchEngagementsFromUSS(userId)
|
||||||
|
.flatMap(engagements => {
|
||||||
|
// For each tweet received in the request, compute the similarity scores between them
|
||||||
|
// and the user signals fetched from USS.
|
||||||
|
Stitch
|
||||||
|
.join(
|
||||||
|
Stitch.traverse(tweetIds)(id => getTweetScores(id, engagements.tweetIds)),
|
||||||
|
Stitch.traverse(tweetIds)(id => getUserScores(id, engagements.authorIds)),
|
||||||
|
)
|
||||||
|
.map {
|
||||||
|
case (tweetScoresSeq, userScoreSeq) =>
|
||||||
|
// All seq have = size because when scores don't exist, they are returned as Option
|
||||||
|
(tweetScoresSeq, userScoreSeq).zipped.map { (tweetScores, userScores) =>
|
||||||
|
computeSimilarityScoresPerTweet(
|
||||||
|
engagements,
|
||||||
|
tweetScores.groupBy(_.id),
|
||||||
|
userScores.groupBy(_.id))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
* Computes the [[SimClustersRecentEngagementSimilarities]]
|
||||||
|
* using the given tweet-tweet and user-tweet scores in TweetScoresMap
|
||||||
|
* and the user signals in [[Engagements]].
|
||||||
|
*/
|
||||||
|
private def computeSimilarityScoresPerTweet(
|
||||||
|
engagements: Engagements,
|
||||||
|
tweetScores: Map[TweetId, Seq[ScoreResult]],
|
||||||
|
authorScores: Map[UserId, Seq[ScoreResult]]
|
||||||
|
): SimClustersRecentEngagementSimilarities = {
|
||||||
|
val favs7d = engagements.favs7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val favs1d = engagements.favs1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val retweets7d = engagements.retweets7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val retweets1d = engagements.retweets1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val follows30d = engagements.follows30d.view
|
||||||
|
.flatMap(s => authorScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val follows7d = engagements.follows7d.view
|
||||||
|
.flatMap(s => authorScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val shares7d = engagements.shares7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val shares1d = engagements.shares1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val replies7d = engagements.replies7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val replies1d = engagements.replies1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val originalTweets7d = engagements.originalTweets7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val originalTweets1d = engagements.originalTweets1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val videoViews7d = engagements.videoPlaybacks7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val videoViews1d = engagements.videoPlaybacks1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val block30d = engagements.block30d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val block7d = engagements.block7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val block1d = engagements.block1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val mute30d = engagements.mute30d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val mute7d = engagements.mute7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val mute1d = engagements.mute1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val report30d = engagements.report30d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val report7d = engagements.report7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val report1d = engagements.report1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val dontlike30d = engagements.dontlike30d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val dontlike7d = engagements.dontlike7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val dontlike1d = engagements.dontlike1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val seeFewer30d = engagements.seeFewer30d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val seeFewer7d = engagements.seeFewer7d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val seeFewer1d = engagements.seeFewer1d.view
|
||||||
|
.flatMap(s => tweetScores.get(s.targetId))
|
||||||
|
.flatten.flatMap(_.score)
|
||||||
|
.force
|
||||||
|
|
||||||
|
val result = SimClustersRecentEngagementSimilarities(
|
||||||
|
fav1dLast10Max = max(favs1d),
|
||||||
|
fav1dLast10Avg = avg(favs1d),
|
||||||
|
fav7dLast10Max = max(favs7d),
|
||||||
|
fav7dLast10Avg = avg(favs7d),
|
||||||
|
retweet1dLast10Max = max(retweets1d),
|
||||||
|
retweet1dLast10Avg = avg(retweets1d),
|
||||||
|
retweet7dLast10Max = max(retweets7d),
|
||||||
|
retweet7dLast10Avg = avg(retweets7d),
|
||||||
|
follow7dLast10Max = max(follows7d),
|
||||||
|
follow7dLast10Avg = avg(follows7d),
|
||||||
|
follow30dLast10Max = max(follows30d),
|
||||||
|
follow30dLast10Avg = avg(follows30d),
|
||||||
|
share1dLast10Max = max(shares1d),
|
||||||
|
share1dLast10Avg = avg(shares1d),
|
||||||
|
share7dLast10Max = max(shares7d),
|
||||||
|
share7dLast10Avg = avg(shares7d),
|
||||||
|
reply1dLast10Max = max(replies1d),
|
||||||
|
reply1dLast10Avg = avg(replies1d),
|
||||||
|
reply7dLast10Max = max(replies7d),
|
||||||
|
reply7dLast10Avg = avg(replies7d),
|
||||||
|
originalTweet1dLast10Max = max(originalTweets1d),
|
||||||
|
originalTweet1dLast10Avg = avg(originalTweets1d),
|
||||||
|
originalTweet7dLast10Max = max(originalTweets7d),
|
||||||
|
originalTweet7dLast10Avg = avg(originalTweets7d),
|
||||||
|
videoPlayback1dLast10Max = max(videoViews1d),
|
||||||
|
videoPlayback1dLast10Avg = avg(videoViews1d),
|
||||||
|
videoPlayback7dLast10Max = max(videoViews7d),
|
||||||
|
videoPlayback7dLast10Avg = avg(videoViews7d),
|
||||||
|
block1dLast10Max = max(block1d),
|
||||||
|
block1dLast10Avg = avg(block1d),
|
||||||
|
block7dLast10Max = max(block7d),
|
||||||
|
block7dLast10Avg = avg(block7d),
|
||||||
|
block30dLast10Max = max(block30d),
|
||||||
|
block30dLast10Avg = avg(block30d),
|
||||||
|
mute1dLast10Max = max(mute1d),
|
||||||
|
mute1dLast10Avg = avg(mute1d),
|
||||||
|
mute7dLast10Max = max(mute7d),
|
||||||
|
mute7dLast10Avg = avg(mute7d),
|
||||||
|
mute30dLast10Max = max(mute30d),
|
||||||
|
mute30dLast10Avg = avg(mute30d),
|
||||||
|
report1dLast10Max = max(report1d),
|
||||||
|
report1dLast10Avg = avg(report1d),
|
||||||
|
report7dLast10Max = max(report7d),
|
||||||
|
report7dLast10Avg = avg(report7d),
|
||||||
|
report30dLast10Max = max(report30d),
|
||||||
|
report30dLast10Avg = avg(report30d),
|
||||||
|
dontlike1dLast10Max = max(dontlike1d),
|
||||||
|
dontlike1dLast10Avg = avg(dontlike1d),
|
||||||
|
dontlike7dLast10Max = max(dontlike7d),
|
||||||
|
dontlike7dLast10Avg = avg(dontlike7d),
|
||||||
|
dontlike30dLast10Max = max(dontlike30d),
|
||||||
|
dontlike30dLast10Avg = avg(dontlike30d),
|
||||||
|
seeFewer1dLast10Max = max(seeFewer1d),
|
||||||
|
seeFewer1dLast10Avg = avg(seeFewer1d),
|
||||||
|
seeFewer7dLast10Max = max(seeFewer7d),
|
||||||
|
seeFewer7dLast10Avg = avg(seeFewer7d),
|
||||||
|
seeFewer30dLast10Max = max(seeFewer30d),
|
||||||
|
seeFewer30dLast10Avg = avg(seeFewer30d),
|
||||||
|
)
|
||||||
|
trackStats(result)
|
||||||
|
result
|
||||||
|
}
|
||||||
|
|
||||||
|
private def trackStats(result: SimClustersRecentEngagementSimilarities): Unit = {
|
||||||
|
val scores = Seq(
|
||||||
|
result.fav7dLast10Max,
|
||||||
|
result.retweet7dLast10Max,
|
||||||
|
result.follow30dLast10Max,
|
||||||
|
result.share1dLast10Max,
|
||||||
|
result.share7dLast10Max,
|
||||||
|
result.reply7dLast10Max,
|
||||||
|
result.originalTweet7dLast10Max,
|
||||||
|
result.videoPlayback7dLast10Max,
|
||||||
|
result.block30dLast10Max,
|
||||||
|
result.mute30dLast10Max,
|
||||||
|
result.report30dLast10Max,
|
||||||
|
result.dontlike30dLast10Max,
|
||||||
|
result.seeFewer30dLast10Max
|
||||||
|
)
|
||||||
|
|
||||||
|
val nonEmpty = scores.exists(_.isDefined)
|
||||||
|
val nonZero = scores.exists { case Some(score) if score > 0 => true; case _ => false }
|
||||||
|
|
||||||
|
if (nonEmpty) {
|
||||||
|
scoresNonEmptyCounter.incr()
|
||||||
|
}
|
||||||
|
|
||||||
|
if (nonZero) {
|
||||||
|
scoresNonZeroCounter.incr()
|
||||||
|
}
|
||||||
|
|
||||||
|
// We use the largest window of a given type of score,
|
||||||
|
// because the largest window is inclusive of smaller windows.
|
||||||
|
trackSignalStats(favNonEmpty, favNonZero, result.fav7dLast10Avg)
|
||||||
|
trackSignalStats(retweetsNonEmpty, retweetsNonZero, result.retweet7dLast10Avg)
|
||||||
|
trackSignalStats(followsNonEmpty, followsNonZero, result.follow30dLast10Avg)
|
||||||
|
trackSignalStats(sharesNonEmpty, sharesNonZero, result.share7dLast10Avg)
|
||||||
|
trackSignalStats(repliesNonEmpty, repliesNonZero, result.reply7dLast10Avg)
|
||||||
|
trackSignalStats(originalTweetsNonEmpty, originalTweetsNonZero, result.originalTweet7dLast10Avg)
|
||||||
|
trackSignalStats(videoViewsNonEmpty, videoViewsNonZero, result.videoPlayback7dLast10Avg)
|
||||||
|
trackSignalStats(blockNonEmpty, blockNonZero, result.block30dLast10Avg)
|
||||||
|
trackSignalStats(muteNonEmpty, muteNonZero, result.mute30dLast10Avg)
|
||||||
|
trackSignalStats(reportNonEmpty, reportNonZero, result.report30dLast10Avg)
|
||||||
|
trackSignalStats(dontlikeNonEmpty, dontlikeNonZero, result.dontlike30dLast10Avg)
|
||||||
|
trackSignalStats(seeFewerNonEmpty, seeFewerNonZero, result.seeFewer30dLast10Avg)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def trackSignalStats(nonEmpty: Counter, nonZero: Counter, score: Option[Double]): Unit = {
|
||||||
|
if (score.nonEmpty) {
|
||||||
|
nonEmpty.incr()
|
||||||
|
|
||||||
|
if (score.get > 0)
|
||||||
|
nonZero.incr()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
object Scorer {
|
||||||
|
def avg(s: Traversable[Double]): Option[Double] =
|
||||||
|
if (s.isEmpty) None else Some(s.sum / s.size)
|
||||||
|
def max(s: Traversable[Double]): Option[Double] =
|
||||||
|
if (s.isEmpty) None else Some(s.foldLeft(0.0D) { (curr, _max) => math.max(curr, _max) })
|
||||||
|
|
||||||
|
private def getAuthorScoreId(
|
||||||
|
userId: UserId,
|
||||||
|
tweetId: TweetId
|
||||||
|
) = {
|
||||||
|
ScoreId(
|
||||||
|
algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity,
|
||||||
|
internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
internalId = InternalId.UserId(userId),
|
||||||
|
modelVersion = ModelVersion.Model20m145k2020,
|
||||||
|
embeddingType = EmbeddingType.FavBasedProducer
|
||||||
|
),
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
internalId = InternalId.TweetId(tweetId),
|
||||||
|
modelVersion = ModelVersion.Model20m145k2020,
|
||||||
|
embeddingType = EmbeddingType.LogFavBasedTweet
|
||||||
|
)
|
||||||
|
))
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
private def getTweetScoreId(
|
||||||
|
sourceTweetId: TweetId,
|
||||||
|
candidateTweetId: TweetId
|
||||||
|
) = {
|
||||||
|
ScoreId(
|
||||||
|
algorithm = ScoringAlgorithm.PairEmbeddingCosineSimilarity,
|
||||||
|
internalId = ScoreInternalId.SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingPairScoreId(
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
internalId = InternalId.TweetId(sourceTweetId),
|
||||||
|
modelVersion = ModelVersion.Model20m145k2020,
|
||||||
|
embeddingType = EmbeddingType.LogFavLongestL2EmbeddingTweet
|
||||||
|
),
|
||||||
|
SimClustersEmbeddingId(
|
||||||
|
internalId = InternalId.TweetId(candidateTweetId),
|
||||||
|
modelVersion = ModelVersion.Model20m145k2020,
|
||||||
|
embeddingType = EmbeddingType.LogFavBasedTweet
|
||||||
|
)
|
||||||
|
))
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,155 @@
|
|||||||
|
package com.twitter.representationscorer.twistlyfeatures
|
||||||
|
|
||||||
|
import com.twitter.decider.SimpleRecipient
|
||||||
|
import com.twitter.finagle.stats.Stat
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.representationscorer.common._
|
||||||
|
import com.twitter.representationscorer.twistlyfeatures.Engagements._
|
||||||
|
import com.twitter.simclusters_v2.common.SimClustersEmbeddingId.LongInternalId
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn
|
||||||
|
import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn.Value
|
||||||
|
import com.twitter.usersignalservice.thriftscala.BatchSignalRequest
|
||||||
|
import com.twitter.usersignalservice.thriftscala.SignalRequest
|
||||||
|
import com.twitter.usersignalservice.thriftscala.SignalType
|
||||||
|
import com.twitter.util.Time
|
||||||
|
import scala.collection.mutable.ArrayBuffer
|
||||||
|
import com.twitter.usersignalservice.thriftscala.ClientIdentifier
|
||||||
|
|
||||||
|
class UserSignalServiceRecentEngagementsClient(
|
||||||
|
stratoClient: SignalsClientColumn,
|
||||||
|
decider: RepresentationScorerDecider,
|
||||||
|
stats: StatsReceiver) {
|
||||||
|
|
||||||
|
import UserSignalServiceRecentEngagementsClient._
|
||||||
|
|
||||||
|
private val signalStats = stats.scope("user-signal-service", "signal")
|
||||||
|
private val signalTypeStats: Map[SignalType, Stat] =
|
||||||
|
SignalType.list.map(s => (s, signalStats.scope(s.name).stat("size"))).toMap
|
||||||
|
|
||||||
|
def get(userId: UserId): Stitch[Engagements] = {
|
||||||
|
val request = buildRequest(userId)
|
||||||
|
stratoClient.fetcher.fetch(request).map(_.v).lowerFromOption().map { response =>
|
||||||
|
val now = Time.now
|
||||||
|
val sevenDaysAgo = now - SevenDaysSpan
|
||||||
|
val thirtyDaysAgo = now - ThirtyDaysSpan
|
||||||
|
|
||||||
|
Engagements(
|
||||||
|
favs7d = getUserSignals(response, SignalType.TweetFavorite, sevenDaysAgo),
|
||||||
|
retweets7d = getUserSignals(response, SignalType.Retweet, sevenDaysAgo),
|
||||||
|
follows30d = getUserSignals(response, SignalType.AccountFollowWithDelay, thirtyDaysAgo),
|
||||||
|
shares7d = getUserSignals(response, SignalType.TweetShareV1, sevenDaysAgo),
|
||||||
|
replies7d = getUserSignals(response, SignalType.Reply, sevenDaysAgo),
|
||||||
|
originalTweets7d = getUserSignals(response, SignalType.OriginalTweet, sevenDaysAgo),
|
||||||
|
videoPlaybacks7d =
|
||||||
|
getUserSignals(response, SignalType.VideoView90dPlayback50V1, sevenDaysAgo),
|
||||||
|
block30d = getUserSignals(response, SignalType.AccountBlock, thirtyDaysAgo),
|
||||||
|
mute30d = getUserSignals(response, SignalType.AccountMute, thirtyDaysAgo),
|
||||||
|
report30d = getUserSignals(response, SignalType.TweetReport, thirtyDaysAgo),
|
||||||
|
dontlike30d = getUserSignals(response, SignalType.TweetDontLike, thirtyDaysAgo),
|
||||||
|
seeFewer30d = getUserSignals(response, SignalType.TweetSeeFewer, thirtyDaysAgo),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private def getUserSignals(
|
||||||
|
response: Value,
|
||||||
|
signalType: SignalType,
|
||||||
|
earliestValidTimestamp: Time
|
||||||
|
): Seq[UserSignal] = {
|
||||||
|
val signals = response.signalResponse
|
||||||
|
.getOrElse(signalType, Seq.empty)
|
||||||
|
.view
|
||||||
|
.filter(_.timestamp > earliestValidTimestamp.inMillis)
|
||||||
|
.map(s => s.targetInternalId.collect { case LongInternalId(id) => (id, s.timestamp) })
|
||||||
|
.collect { case Some((id, engagedAt)) => UserSignal(id, engagedAt) }
|
||||||
|
.take(EngagementsToScore)
|
||||||
|
.force
|
||||||
|
|
||||||
|
signalTypeStats(signalType).add(signals.size)
|
||||||
|
signals
|
||||||
|
}
|
||||||
|
|
||||||
|
private def buildRequest(userId: Long) = {
|
||||||
|
val recipient = Some(SimpleRecipient(userId))
|
||||||
|
|
||||||
|
// Signals RSX always fetches
|
||||||
|
val requestSignals = ArrayBuffer(
|
||||||
|
SignalRequestFav,
|
||||||
|
SignalRequestRetweet,
|
||||||
|
SignalRequestFollow
|
||||||
|
)
|
||||||
|
|
||||||
|
// Signals under experimentation. We use individual deciders to disable them if necessary.
|
||||||
|
// If experiments are successful, they will become permanent.
|
||||||
|
if (decider.isAvailable(FetchSignalShareDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestShare)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalReplyDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestReply)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalOriginalTweetDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestOriginalTweet)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalVideoPlaybackDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestVideoPlayback)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalBlockDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestBlock)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalMuteDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestMute)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalReportDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestReport)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalDontlikeDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestDontlike)
|
||||||
|
|
||||||
|
if (decider.isAvailable(FetchSignalSeeFewerDeciderKey, recipient))
|
||||||
|
requestSignals.append(SignalRequestSeeFewer)
|
||||||
|
|
||||||
|
BatchSignalRequest(userId, requestSignals, Some(ClientIdentifier.RepresentationScorerHome))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
object UserSignalServiceRecentEngagementsClient {
|
||||||
|
val FetchSignalShareDeciderKey = "representation_scorer_fetch_signal_share"
|
||||||
|
val FetchSignalReplyDeciderKey = "representation_scorer_fetch_signal_reply"
|
||||||
|
val FetchSignalOriginalTweetDeciderKey = "representation_scorer_fetch_signal_original_tweet"
|
||||||
|
val FetchSignalVideoPlaybackDeciderKey = "representation_scorer_fetch_signal_video_playback"
|
||||||
|
val FetchSignalBlockDeciderKey = "representation_scorer_fetch_signal_block"
|
||||||
|
val FetchSignalMuteDeciderKey = "representation_scorer_fetch_signal_mute"
|
||||||
|
val FetchSignalReportDeciderKey = "representation_scorer_fetch_signal_report"
|
||||||
|
val FetchSignalDontlikeDeciderKey = "representation_scorer_fetch_signal_dont_like"
|
||||||
|
val FetchSignalSeeFewerDeciderKey = "representation_scorer_fetch_signal_see_fewer"
|
||||||
|
|
||||||
|
val EngagementsToScore = 10
|
||||||
|
private val engagementsToScoreOpt: Option[Long] = Some(EngagementsToScore)
|
||||||
|
|
||||||
|
val SignalRequestFav: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.TweetFavorite)
|
||||||
|
val SignalRequestRetweet: SignalRequest = SignalRequest(engagementsToScoreOpt, SignalType.Retweet)
|
||||||
|
val SignalRequestFollow: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.AccountFollowWithDelay)
|
||||||
|
// New experimental signals
|
||||||
|
val SignalRequestShare: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.TweetShareV1)
|
||||||
|
val SignalRequestReply: SignalRequest = SignalRequest(engagementsToScoreOpt, SignalType.Reply)
|
||||||
|
val SignalRequestOriginalTweet: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.OriginalTweet)
|
||||||
|
val SignalRequestVideoPlayback: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.VideoView90dPlayback50V1)
|
||||||
|
|
||||||
|
// Negative signals
|
||||||
|
val SignalRequestBlock: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.AccountBlock)
|
||||||
|
val SignalRequestMute: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.AccountMute)
|
||||||
|
val SignalRequestReport: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.TweetReport)
|
||||||
|
val SignalRequestDontlike: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.TweetDontLike)
|
||||||
|
val SignalRequestSeeFewer: SignalRequest =
|
||||||
|
SignalRequest(engagementsToScoreOpt, SignalType.TweetSeeFewer)
|
||||||
|
}
|
@ -0,0 +1,57 @@
|
|||||||
|
package com.twitter.representationscorer.twistlyfeatures
|
||||||
|
|
||||||
|
import com.github.benmanes.caffeine.cache.Caffeine
|
||||||
|
import com.twitter.stitch.cache.EvictingCache
|
||||||
|
import com.google.inject.Provides
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.inject.TwitterModule
|
||||||
|
import com.twitter.representationscorer.common.RepresentationScorerDecider
|
||||||
|
import com.twitter.stitch.Stitch
|
||||||
|
import com.twitter.stitch.cache.ConcurrentMapCache
|
||||||
|
import com.twitter.stitch.cache.MemoizeQuery
|
||||||
|
import com.twitter.strato.client.Client
|
||||||
|
import com.twitter.strato.generated.client.recommendations.user_signal_service.SignalsClientColumn
|
||||||
|
import java.util.concurrent.ConcurrentMap
|
||||||
|
import java.util.concurrent.TimeUnit
|
||||||
|
import javax.inject.Singleton
|
||||||
|
|
||||||
|
object UserSignalServiceRecentEngagementsClientModule extends TwitterModule {
|
||||||
|
|
||||||
|
@Singleton
|
||||||
|
@Provides
|
||||||
|
def provide(
|
||||||
|
client: Client,
|
||||||
|
decider: RepresentationScorerDecider,
|
||||||
|
statsReceiver: StatsReceiver
|
||||||
|
): Long => Stitch[Engagements] = {
|
||||||
|
val stratoClient = new SignalsClientColumn(client)
|
||||||
|
|
||||||
|
/*
|
||||||
|
This cache holds a users recent engagements for a short period of time, such that batched requests
|
||||||
|
for multiple (userid, tweetid) pairs don't all need to fetch them.
|
||||||
|
|
||||||
|
[1] Caffeine cache keys/values must be objects, so we cannot use the `Long` primitive directly.
|
||||||
|
The boxed java.lang.Long works as a key, since it is an object. In most situations the compiler
|
||||||
|
can see where auto(un)boxing can occur. However, here we seem to need some wrapper functions
|
||||||
|
with explicit types to allow the boxing to happen.
|
||||||
|
*/
|
||||||
|
val mapCache: ConcurrentMap[java.lang.Long, Stitch[Engagements]] =
|
||||||
|
Caffeine
|
||||||
|
.newBuilder()
|
||||||
|
.expireAfterWrite(5, TimeUnit.SECONDS)
|
||||||
|
.maximumSize(
|
||||||
|
1000 // We estimate 5M unique users in a 5m period - with 2k RSX instances, assume that one will see < 1k in a 5s period
|
||||||
|
)
|
||||||
|
.build[java.lang.Long, Stitch[Engagements]]
|
||||||
|
.asMap
|
||||||
|
|
||||||
|
statsReceiver.provideGauge("ussRecentEngagementsClient", "cache_size") { mapCache.size.toFloat }
|
||||||
|
|
||||||
|
val engagementsClient =
|
||||||
|
new UserSignalServiceRecentEngagementsClient(stratoClient, decider, statsReceiver)
|
||||||
|
|
||||||
|
val f = (l: java.lang.Long) => engagementsClient.get(l) // See note [1] above
|
||||||
|
val cachedCall = MemoizeQuery(f, EvictingCache.lazily(new ConcurrentMapCache(mapCache)))
|
||||||
|
(l: Long) => cachedCall(l) // see note [1] above
|
||||||
|
}
|
||||||
|
}
|
20
representation-scorer/server/src/main/thrift/BUILD
Normal file
20
representation-scorer/server/src/main/thrift/BUILD
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
create_thrift_libraries(
|
||||||
|
base_name = "thrift",
|
||||||
|
sources = [
|
||||||
|
"com/twitter/representationscorer/service.thrift",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
tags = [
|
||||||
|
"bazel-compatible",
|
||||||
|
],
|
||||||
|
dependency_roots = [
|
||||||
|
"src/thrift/com/twitter/simclusters_v2:simclusters_v2-thrift",
|
||||||
|
],
|
||||||
|
generate_languages = [
|
||||||
|
"java",
|
||||||
|
"scala",
|
||||||
|
"strato",
|
||||||
|
],
|
||||||
|
provides_java_name = "representationscorer-service-thrift-java",
|
||||||
|
provides_scala_name = "representationscorer-service-thrift-scala",
|
||||||
|
)
|
@ -0,0 +1,106 @@
|
|||||||
|
namespace java com.twitter.representationscorer.thriftjava
|
||||||
|
#@namespace scala com.twitter.representationscorer.thriftscala
|
||||||
|
#@namespace strato com.twitter.representationscorer
|
||||||
|
|
||||||
|
include "com/twitter/simclusters_v2/identifier.thrift"
|
||||||
|
include "com/twitter/simclusters_v2/online_store.thrift"
|
||||||
|
include "com/twitter/simclusters_v2/score.thrift"
|
||||||
|
|
||||||
|
struct SimClustersRecentEngagementSimilarities {
|
||||||
|
// All scores computed using cosine similarity
|
||||||
|
// 1 - 1000 Positive Signals
|
||||||
|
1: optional double fav1dLast10Max // max score from last 10 faves in the last 1 day
|
||||||
|
2: optional double fav1dLast10Avg // avg score from last 10 faves in the last 1 day
|
||||||
|
3: optional double fav7dLast10Max // max score from last 10 faves in the last 7 days
|
||||||
|
4: optional double fav7dLast10Avg // avg score from last 10 faves in the last 7 days
|
||||||
|
5: optional double retweet1dLast10Max // max score from last 10 retweets in the last 1 days
|
||||||
|
6: optional double retweet1dLast10Avg // avg score from last 10 retweets in the last 1 days
|
||||||
|
7: optional double retweet7dLast10Max // max score from last 10 retweets in the last 7 days
|
||||||
|
8: optional double retweet7dLast10Avg // avg score from last 10 retweets in the last 7 days
|
||||||
|
9: optional double follow7dLast10Max // max score from the last 10 follows in the last 7 days
|
||||||
|
10: optional double follow7dLast10Avg // avg score from the last 10 follows in the last 7 days
|
||||||
|
11: optional double follow30dLast10Max // max score from the last 10 follows in the last 30 days
|
||||||
|
12: optional double follow30dLast10Avg // avg score from the last 10 follows in the last 30 days
|
||||||
|
13: optional double share1dLast10Max // max score from last 10 shares in the last 1 day
|
||||||
|
14: optional double share1dLast10Avg // avg score from last 10 shares in the last 1 day
|
||||||
|
15: optional double share7dLast10Max // max score from last 10 shares in the last 7 days
|
||||||
|
16: optional double share7dLast10Avg // avg score from last 10 shares in the last 7 days
|
||||||
|
17: optional double reply1dLast10Max // max score from last 10 replies in the last 1 day
|
||||||
|
18: optional double reply1dLast10Avg // avg score from last 10 replies in the last 1 day
|
||||||
|
19: optional double reply7dLast10Max // max score from last 10 replies in the last 7 days
|
||||||
|
20: optional double reply7dLast10Avg // avg score from last 10 replies in the last 7 days
|
||||||
|
21: optional double originalTweet1dLast10Max // max score from last 10 original tweets in the last 1 day
|
||||||
|
22: optional double originalTweet1dLast10Avg // avg score from last 10 original tweets in the last 1 day
|
||||||
|
23: optional double originalTweet7dLast10Max // max score from last 10 original tweets in the last 7 days
|
||||||
|
24: optional double originalTweet7dLast10Avg // avg score from last 10 original tweets in the last 7 days
|
||||||
|
25: optional double videoPlayback1dLast10Max // max score from last 10 video playback50 in the last 1 day
|
||||||
|
26: optional double videoPlayback1dLast10Avg // avg score from last 10 video playback50 in the last 1 day
|
||||||
|
27: optional double videoPlayback7dLast10Max // max score from last 10 video playback50 in the last 7 days
|
||||||
|
28: optional double videoPlayback7dLast10Avg // avg score from last 10 video playback50 in the last 7 days
|
||||||
|
|
||||||
|
// 1001 - 2000 Implicit Signals
|
||||||
|
|
||||||
|
// 2001 - 3000 Negative Signals
|
||||||
|
// Block Series
|
||||||
|
2001: optional double block1dLast10Avg
|
||||||
|
2002: optional double block1dLast10Max
|
||||||
|
2003: optional double block7dLast10Avg
|
||||||
|
2004: optional double block7dLast10Max
|
||||||
|
2005: optional double block30dLast10Avg
|
||||||
|
2006: optional double block30dLast10Max
|
||||||
|
// Mute Series
|
||||||
|
2101: optional double mute1dLast10Avg
|
||||||
|
2102: optional double mute1dLast10Max
|
||||||
|
2103: optional double mute7dLast10Avg
|
||||||
|
2104: optional double mute7dLast10Max
|
||||||
|
2105: optional double mute30dLast10Avg
|
||||||
|
2106: optional double mute30dLast10Max
|
||||||
|
// Report Series
|
||||||
|
2201: optional double report1dLast10Avg
|
||||||
|
2202: optional double report1dLast10Max
|
||||||
|
2203: optional double report7dLast10Avg
|
||||||
|
2204: optional double report7dLast10Max
|
||||||
|
2205: optional double report30dLast10Avg
|
||||||
|
2206: optional double report30dLast10Max
|
||||||
|
// Dontlike
|
||||||
|
2301: optional double dontlike1dLast10Avg
|
||||||
|
2302: optional double dontlike1dLast10Max
|
||||||
|
2303: optional double dontlike7dLast10Avg
|
||||||
|
2304: optional double dontlike7dLast10Max
|
||||||
|
2305: optional double dontlike30dLast10Avg
|
||||||
|
2306: optional double dontlike30dLast10Max
|
||||||
|
// SeeFewer
|
||||||
|
2401: optional double seeFewer1dLast10Avg
|
||||||
|
2402: optional double seeFewer1dLast10Max
|
||||||
|
2403: optional double seeFewer7dLast10Avg
|
||||||
|
2404: optional double seeFewer7dLast10Max
|
||||||
|
2405: optional double seeFewer30dLast10Avg
|
||||||
|
2406: optional double seeFewer30dLast10Max
|
||||||
|
}(persisted='true', hasPersonalData = 'true')
|
||||||
|
|
||||||
|
/*
|
||||||
|
* List score API
|
||||||
|
*/
|
||||||
|
struct ListScoreId {
|
||||||
|
1: required score.ScoringAlgorithm algorithm
|
||||||
|
2: required online_store.ModelVersion modelVersion
|
||||||
|
3: required identifier.EmbeddingType targetEmbeddingType
|
||||||
|
4: required identifier.InternalId targetId
|
||||||
|
5: required identifier.EmbeddingType candidateEmbeddingType
|
||||||
|
6: required list<identifier.InternalId> candidateIds
|
||||||
|
}(hasPersonalData = 'true')
|
||||||
|
|
||||||
|
struct ScoreResult {
|
||||||
|
// This api does not communicate why a score is missing. For example, it may be unavailable
|
||||||
|
// because the referenced entities do not exist (e.g. the embedding was not found) or because
|
||||||
|
// timeouts prevented us from calculating it.
|
||||||
|
1: optional double score
|
||||||
|
}
|
||||||
|
|
||||||
|
struct ListScoreResponse {
|
||||||
|
1: required list<ScoreResult> scores // Guaranteed to be the same number/order as requested
|
||||||
|
}
|
||||||
|
|
||||||
|
struct RecentEngagementSimilaritiesResponse {
|
||||||
|
1: required list<SimClustersRecentEngagementSimilarities> results // Guaranteed to be the same number/order as requested
|
||||||
|
}
|
@ -0,0 +1,68 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.Feature
|
||||||
|
import com.twitter.ml.api.FeatureContext
|
||||||
|
import com.twitter.ml.api.ITransform
|
||||||
|
import com.twitter.ml.api.constant.SharedFeatures
|
||||||
|
import java.lang.{Double => JDouble}
|
||||||
|
|
||||||
|
import com.twitter.timelines.prediction.common.adapters.AdapterConsumer
|
||||||
|
import com.twitter.timelines.prediction.common.adapters.EngagementLabelFeaturesDataRecordUtils
|
||||||
|
import com.twitter.ml.api.DataRecord
|
||||||
|
import com.twitter.ml.api.RichDataRecord
|
||||||
|
import com.twitter.timelines.suggests.common.engagement.thriftscala.EngagementType
|
||||||
|
import com.twitter.timelines.suggests.common.engagement.thriftscala.Engagement
|
||||||
|
import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.common.CombinedFeatures
|
||||||
|
|
||||||
|
/**
|
||||||
|
* To transfrom BCE events UUA data records that contain only continuous dwell time to datarecords that contain corresponding binary label features
|
||||||
|
* The UUA datarecords inputted would have USER_ID, SOURCE_TWEET_ID,TIMESTAMP and
|
||||||
|
* 0 or one of (TWEET_DETAIL_DWELL_TIME_MS, PROFILE_DWELL_TIME_MS, FULLSCREEN_VIDEO_DWELL_TIME_MS) features.
|
||||||
|
* We will use the different engagement TIME_MS to differentiate different engagements,
|
||||||
|
* and then re-use the function in EngagementTypeConverte to add the binary label to the datarecord.
|
||||||
|
**/
|
||||||
|
|
||||||
|
object BCELabelTransformFromUUADataRecord extends ITransform {
|
||||||
|
|
||||||
|
val dwellTimeFeatureToEngagementMap = Map(
|
||||||
|
TimelinesSharedFeatures.TWEET_DETAIL_DWELL_TIME_MS -> EngagementType.TweetDetailDwell,
|
||||||
|
TimelinesSharedFeatures.PROFILE_DWELL_TIME_MS -> EngagementType.ProfileDwell,
|
||||||
|
TimelinesSharedFeatures.FULLSCREEN_VIDEO_DWELL_TIME_MS -> EngagementType.FullscreenVideoDwell
|
||||||
|
)
|
||||||
|
|
||||||
|
def dwellFeatureToEngagement(
|
||||||
|
rdr: RichDataRecord,
|
||||||
|
dwellTimeFeature: Feature[JDouble],
|
||||||
|
engagementType: EngagementType
|
||||||
|
): Option[Engagement] = {
|
||||||
|
if (rdr.hasFeature(dwellTimeFeature)) {
|
||||||
|
Some(
|
||||||
|
Engagement(
|
||||||
|
engagementType = engagementType,
|
||||||
|
timestampMs = rdr.getFeatureValue(SharedFeatures.TIMESTAMP),
|
||||||
|
weight = Some(rdr.getFeatureValue(dwellTimeFeature))
|
||||||
|
))
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
override def transformContext(featureContext: FeatureContext): FeatureContext = {
|
||||||
|
featureContext.addFeatures(
|
||||||
|
(CombinedFeatures.TweetDetailDwellEngagements ++ CombinedFeatures.ProfileDwellEngagements ++ CombinedFeatures.FullscreenVideoDwellEngagements).toSeq: _*)
|
||||||
|
}
|
||||||
|
override def transform(record: DataRecord): Unit = {
|
||||||
|
val rdr = new RichDataRecord(record)
|
||||||
|
val engagements = dwellTimeFeatureToEngagementMap
|
||||||
|
.map {
|
||||||
|
case (dwellTimeFeature, engagementType) =>
|
||||||
|
dwellFeatureToEngagement(rdr, dwellTimeFeature, engagementType)
|
||||||
|
}.flatten.toSeq
|
||||||
|
|
||||||
|
// Re-use BCE( behavior client events) label conversion in EngagementTypeConverter to align with BCE labels generation for offline training data
|
||||||
|
EngagementLabelFeaturesDataRecordUtils.setDwellTimeFeatures(
|
||||||
|
rdr,
|
||||||
|
Some(engagements),
|
||||||
|
AdapterConsumer.Combined)
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,353 @@
|
|||||||
|
create_datasets(
|
||||||
|
base_name = "original_author_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/original_author_aggregates/1556496000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.OriginalAuthor",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "twitter_wide_user_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/twitter_wide_user_aggregates/1556496000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.TwitterWideUser",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "twitter_wide_user_author_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/twitter_wide_user_author_aggregates/1556323200000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.TwitterWideUserAuthor",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_aggregates/1556150400000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.User",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_author_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_author_aggregates/1556064000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserAuthor",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "aggregates_canary",
|
||||||
|
fallback_path = "gs://user.timelines.dp.gcp.twttr.net//canaries/processed/aggregates_v2/user_aggregates/1622851200000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.User",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_engager_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_engager_aggregates/1556496000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserEngager",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_original_author_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_original_author_aggregates/1556496000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserOriginalAuthor",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "author_topic_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/author_topic_aggregates/1589932800000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.AuthorTopic",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_topic_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_topic_aggregates/1590278400000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserTopic",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_inferred_topic_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_inferred_topic_aggregates/1599696000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserInferredTopic",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_mention_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_mention_aggregates/1556582400000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserMention",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_request_dow_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_request_dow_aggregates/1556236800000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserRequestDow",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_request_hour_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_request_hour_aggregates/1556150400000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserRequestHour",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_list_aggregates",
|
||||||
|
fallback_path = "viewfs://hadoop-proc2-nn.atla.twitter.com/user/timelines/processed/aggregates_v2/user_list_aggregates/1590624000000",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserList",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
create_datasets(
|
||||||
|
base_name = "user_media_understanding_annotation_aggregates",
|
||||||
|
key_type = "com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey",
|
||||||
|
platform = "java8",
|
||||||
|
role = "timelines",
|
||||||
|
scala_schema = "com.twitter.timelines.prediction.common.aggregates.TimelinesAggregationKeyValInjections.UserMediaUnderstandingAnnotation",
|
||||||
|
segment_type = "snapshot",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
val_type = "(com.twitter.summingbird.batch.BatchID, com.twitter.ml.api.DataRecord)",
|
||||||
|
scala_dependencies = [
|
||||||
|
":injections",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
scala_library(
|
||||||
|
sources = [
|
||||||
|
"BCELabelTransformFromUUADataRecord.scala",
|
||||||
|
"FeatureSelectorConfig.scala",
|
||||||
|
"RecapUserFeatureAggregation.scala",
|
||||||
|
"RectweetUserFeatureAggregation.scala",
|
||||||
|
"TimelinesAggregationConfig.scala",
|
||||||
|
"TimelinesAggregationConfigDetails.scala",
|
||||||
|
"TimelinesAggregationConfigTrait.scala",
|
||||||
|
"TimelinesAggregationSources.scala",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
":aggregates_canary-scala",
|
||||||
|
":author_topic_aggregates-scala",
|
||||||
|
":original_author_aggregates-scala",
|
||||||
|
":twitter_wide_user_aggregates-scala",
|
||||||
|
":twitter_wide_user_author_aggregates-scala",
|
||||||
|
":user_aggregates-scala",
|
||||||
|
":user_author_aggregates-scala",
|
||||||
|
":user_engager_aggregates-scala",
|
||||||
|
":user_inferred_topic_aggregates-scala",
|
||||||
|
":user_list_aggregates-scala",
|
||||||
|
":user_media_understanding_annotation_aggregates-scala",
|
||||||
|
":user_mention_aggregates-scala",
|
||||||
|
":user_original_author_aggregates-scala",
|
||||||
|
":user_request_dow_aggregates-scala",
|
||||||
|
":user_request_hour_aggregates-scala",
|
||||||
|
":user_topic_aggregates-scala",
|
||||||
|
"src/java/com/twitter/ml/api:api-base",
|
||||||
|
"src/java/com/twitter/ml/api/constant",
|
||||||
|
"src/java/com/twitter/ml/api/matcher",
|
||||||
|
"src/scala/com/twitter/common/text/util",
|
||||||
|
"src/scala/com/twitter/dal/client/dataset",
|
||||||
|
"src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core",
|
||||||
|
"src/scala/com/twitter/scalding_internal/multiformat/format",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/common",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/engagement_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/escherbird",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/itl",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/list_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/p_home_latest",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/real_graph",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/recap",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/request_context",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/simcluster",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/time_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/transform/filter",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:engagement-scala",
|
||||||
|
"timelines/data_processing/ad_hoc/recap/data_record_preparation:recap_data_records_agg_minimal-java",
|
||||||
|
"util/util-core:scala",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
scala_library(
|
||||||
|
name = "injections",
|
||||||
|
sources = [
|
||||||
|
"FeatureSelectorConfig.scala",
|
||||||
|
"RecapUserFeatureAggregation.scala",
|
||||||
|
"RectweetUserFeatureAggregation.scala",
|
||||||
|
"TimelinesAggregationConfigDetails.scala",
|
||||||
|
"TimelinesAggregationConfigTrait.scala",
|
||||||
|
"TimelinesAggregationKeyValInjections.scala",
|
||||||
|
"TimelinesAggregationSources.scala",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"src/java/com/twitter/ml/api:api-base",
|
||||||
|
"src/java/com/twitter/ml/api/constant",
|
||||||
|
"src/java/com/twitter/ml/api/matcher",
|
||||||
|
"src/scala/com/twitter/common/text/util",
|
||||||
|
"src/scala/com/twitter/dal/client/dataset",
|
||||||
|
"src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core",
|
||||||
|
"src/scala/com/twitter/scalding_internal/multiformat/format",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/common",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/engagement_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/escherbird",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/itl",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/list_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/p_home_latest",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/real_graph",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/recap",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/request_context",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/semantic_core_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/simcluster",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/time_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/transform/filter",
|
||||||
|
"timelines/data_processing/ad_hoc/recap/data_record_preparation:recap_data_records_agg_minimal-java",
|
||||||
|
"util/util-core:scala",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,121 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.matcher.FeatureMatcher
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup
|
||||||
|
import scala.collection.JavaConverters._
|
||||||
|
|
||||||
|
object FeatureSelectorConfig {
|
||||||
|
val BasePairsToStore = Seq(
|
||||||
|
("twitter_wide_user_aggregate.pair", "*"),
|
||||||
|
("twitter_wide_user_author_aggregate.pair", "*"),
|
||||||
|
("user_aggregate_v5.continuous.pair", "*"),
|
||||||
|
("user_aggregate_v7.pair", "*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.earlybird.*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.searchfeature.*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.tweetfeature.embeds*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.tweetfeature.link_count*"),
|
||||||
|
("user_author_aggregate_v2.pair", "engagement_features.in_network.*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.tweetfeature.is_reply.*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.tweetfeature.is_retweet.*"),
|
||||||
|
("user_author_aggregate_v2.pair", "recap.tweetfeature.num_mentions.*"),
|
||||||
|
("user_author_aggregate_v5.pair", "*"),
|
||||||
|
("user_author_aggregate_tweetsource_v1.pair", "*"),
|
||||||
|
("user_engager_aggregate.pair", "*"),
|
||||||
|
("user_mention_aggregate.pair", "*"),
|
||||||
|
("user_request_context_aggregate.dow.pair", "*"),
|
||||||
|
("user_request_context_aggregate.hour.pair", "*"),
|
||||||
|
("user_aggregate_v6.pair", "*"),
|
||||||
|
("user_original_author_aggregate_v1.pair", "*"),
|
||||||
|
("user_original_author_aggregate_v2.pair", "*"),
|
||||||
|
("original_author_aggregate_v1.pair", "*"),
|
||||||
|
("original_author_aggregate_v2.pair", "*"),
|
||||||
|
("author_topic_aggregate.pair", "*"),
|
||||||
|
("user_list_aggregate.pair", "*"),
|
||||||
|
("user_topic_aggregate.pair", "*"),
|
||||||
|
("user_topic_aggregate_v2.pair", "*"),
|
||||||
|
("user_inferred_topic_aggregate.pair", "*"),
|
||||||
|
("user_inferred_topic_aggregate_v2.pair", "*"),
|
||||||
|
("user_media_annotation_aggregate.pair", "*"),
|
||||||
|
("user_media_annotation_aggregate.pair", "*"),
|
||||||
|
("user_author_good_click_aggregate.pair", "*"),
|
||||||
|
("user_engager_good_click_aggregate.pair", "*")
|
||||||
|
)
|
||||||
|
val PairsToStore = BasePairsToStore ++ Seq(
|
||||||
|
("user_aggregate_v2.pair", "*"),
|
||||||
|
("user_aggregate_v5.boolean.pair", "*"),
|
||||||
|
("user_aggregate_tweetsource_v1.pair", "*"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
val LabelsToStore = Seq(
|
||||||
|
"any_label",
|
||||||
|
"recap.engagement.is_favorited",
|
||||||
|
"recap.engagement.is_retweeted",
|
||||||
|
"recap.engagement.is_replied",
|
||||||
|
"recap.engagement.is_open_linked",
|
||||||
|
"recap.engagement.is_profile_clicked",
|
||||||
|
"recap.engagement.is_clicked",
|
||||||
|
"recap.engagement.is_photo_expanded",
|
||||||
|
"recap.engagement.is_video_playback_50",
|
||||||
|
"recap.engagement.is_video_quality_viewed",
|
||||||
|
"recap.engagement.is_replied_reply_impressed_by_author",
|
||||||
|
"recap.engagement.is_replied_reply_favorited_by_author",
|
||||||
|
"recap.engagement.is_replied_reply_replied_by_author",
|
||||||
|
"recap.engagement.is_report_tweet_clicked",
|
||||||
|
"recap.engagement.is_block_clicked",
|
||||||
|
"recap.engagement.is_mute_clicked",
|
||||||
|
"recap.engagement.is_dont_like",
|
||||||
|
"recap.engagement.is_good_clicked_convo_desc_favorited_or_replied",
|
||||||
|
"recap.engagement.is_good_clicked_convo_desc_v2",
|
||||||
|
"itl.engagement.is_favorited",
|
||||||
|
"itl.engagement.is_retweeted",
|
||||||
|
"itl.engagement.is_replied",
|
||||||
|
"itl.engagement.is_open_linked",
|
||||||
|
"itl.engagement.is_profile_clicked",
|
||||||
|
"itl.engagement.is_clicked",
|
||||||
|
"itl.engagement.is_photo_expanded",
|
||||||
|
"itl.engagement.is_video_playback_50"
|
||||||
|
)
|
||||||
|
|
||||||
|
val PairGlobsToStore = for {
|
||||||
|
(prefix, suffix) <- PairsToStore
|
||||||
|
label <- LabelsToStore
|
||||||
|
} yield FeatureMatcher.glob(prefix + "." + label + "." + suffix)
|
||||||
|
|
||||||
|
val BaseAggregateV2FeatureSelector = FeatureMatcher
|
||||||
|
.none()
|
||||||
|
.or(
|
||||||
|
FeatureMatcher.glob("meta.user_id"),
|
||||||
|
FeatureMatcher.glob("meta.author_id"),
|
||||||
|
FeatureMatcher.glob("entities.original_author_id"),
|
||||||
|
FeatureMatcher.glob("entities.topic_id"),
|
||||||
|
FeatureMatcher
|
||||||
|
.glob("entities.inferred_topic_ids" + TypedAggregateGroup.SparseFeatureSuffix),
|
||||||
|
FeatureMatcher.glob("timelines.meta.list_id"),
|
||||||
|
FeatureMatcher.glob("list.id"),
|
||||||
|
FeatureMatcher
|
||||||
|
.glob("engagement_features.user_ids.public" + TypedAggregateGroup.SparseFeatureSuffix),
|
||||||
|
FeatureMatcher
|
||||||
|
.glob("entities.users.mentioned_screen_names" + TypedAggregateGroup.SparseFeatureSuffix),
|
||||||
|
FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_dont_like.*"),
|
||||||
|
FeatureMatcher.glob("user_author_aggregate_v2.pair.any_label.recap.tweetfeature.has_*"),
|
||||||
|
FeatureMatcher.glob("request_context.country_code"),
|
||||||
|
FeatureMatcher.glob("request_context.timestamp_gmt_dow"),
|
||||||
|
FeatureMatcher.glob("request_context.timestamp_gmt_hour"),
|
||||||
|
FeatureMatcher.glob(
|
||||||
|
"semantic_core.media_understanding.high_recall.non_sensitive.entity_ids" + TypedAggregateGroup.SparseFeatureSuffix)
|
||||||
|
)
|
||||||
|
|
||||||
|
val AggregatesV2ProdFeatureSelector = BaseAggregateV2FeatureSelector
|
||||||
|
.orList(PairGlobsToStore.asJava)
|
||||||
|
|
||||||
|
val ReducedPairGlobsToStore = (for {
|
||||||
|
(prefix, suffix) <- BasePairsToStore
|
||||||
|
label <- LabelsToStore
|
||||||
|
} yield FeatureMatcher.glob(prefix + "." + label + "." + suffix)) ++ Seq(
|
||||||
|
FeatureMatcher.glob("user_aggregate_v2.pair.any_label.*"),
|
||||||
|
FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_favorited.*"),
|
||||||
|
FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_photo_expanded.*"),
|
||||||
|
FeatureMatcher.glob("user_aggregate_v2.pair.recap.engagement.is_profile_clicked.*")
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,6 @@
|
|||||||
|
## Timelines Aggregation Jobs
|
||||||
|
|
||||||
|
This directory contains the specific definition of aggregate jobs that generate features used by the Heavy Ranker.
|
||||||
|
The primary files of interest are [`TimelinesAggregationConfigDetails.scala`](TimelinesAggregationConfigDetails.scala), which contains the defintion for the batch aggregate jobs and [`real_time/TimelinesOnlineAggregationConfigBase.scala`](real_time/TimelinesOnlineAggregationConfigBase.scala) which contains the definitions for the real time aggregate jobs.
|
||||||
|
|
||||||
|
The aggregation framework that these jobs are based on is [here](../../../../../../../../timelines/data_processing/ml_util/aggregation_framework).
|
@ -0,0 +1,415 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.Feature
|
||||||
|
import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.real_graph.RealGraphDataRecordFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.recap.RecapFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.time_features.TimeDataRecordFeatures
|
||||||
|
|
||||||
|
object RecapUserFeatureAggregation {
|
||||||
|
val RecapFeaturesForAggregation: Set[Feature[_]] =
|
||||||
|
Set(
|
||||||
|
RecapFeatures.HAS_IMAGE,
|
||||||
|
RecapFeatures.HAS_VIDEO,
|
||||||
|
RecapFeatures.FROM_MUTUAL_FOLLOW,
|
||||||
|
RecapFeatures.HAS_CARD,
|
||||||
|
RecapFeatures.HAS_NEWS,
|
||||||
|
RecapFeatures.REPLY_COUNT,
|
||||||
|
RecapFeatures.FAV_COUNT,
|
||||||
|
RecapFeatures.RETWEET_COUNT,
|
||||||
|
RecapFeatures.BLENDER_SCORE,
|
||||||
|
RecapFeatures.CONVERSATIONAL_COUNT,
|
||||||
|
RecapFeatures.IS_BUSINESS_SCORE,
|
||||||
|
RecapFeatures.CONTAINS_MEDIA,
|
||||||
|
RecapFeatures.RETWEET_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_SEARCHER,
|
||||||
|
RecapFeatures.MENTION_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_OTHER,
|
||||||
|
RecapFeatures.RETWEET_OTHER,
|
||||||
|
RecapFeatures.MATCH_UI_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_MAIN_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_LANGS,
|
||||||
|
RecapFeatures.TWEET_COUNT_FROM_USER_IN_SNAPSHOT,
|
||||||
|
RecapFeatures.TEXT_SCORE,
|
||||||
|
RealGraphDataRecordFeatures.NUM_RETWEETS_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_RETWEETS_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_RETWEETS_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_RETWEETS_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.NUM_FAVORITES_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_FAVORITES_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_FAVORITES_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_FAVORITES_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.NUM_MENTIONS_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_MENTIONS_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_MENTIONS_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_MENTIONS_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_TWEET_CLICKS_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_PROFILE_VIEWS_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.TOTAL_DWELL_TIME_DAYS_SINCE_LAST,
|
||||||
|
RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_EWMA,
|
||||||
|
RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_NON_ZERO_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_ELAPSED_DAYS,
|
||||||
|
RealGraphDataRecordFeatures.NUM_INSPECTED_TWEETS_DAYS_SINCE_LAST
|
||||||
|
)
|
||||||
|
|
||||||
|
val RecapLabelsForAggregation: Set[Feature.Binary] =
|
||||||
|
Set(
|
||||||
|
RecapFeatures.IS_FAVORITED,
|
||||||
|
RecapFeatures.IS_RETWEETED,
|
||||||
|
RecapFeatures.IS_CLICKED,
|
||||||
|
RecapFeatures.IS_PROFILE_CLICKED,
|
||||||
|
RecapFeatures.IS_OPEN_LINKED
|
||||||
|
)
|
||||||
|
|
||||||
|
val DwellDuration: Set[Feature[_]] =
|
||||||
|
Set(
|
||||||
|
TimelinesSharedFeatures.DWELL_TIME_MS,
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserFeaturesV2: Set[Feature[_]] = RecapFeaturesForAggregation ++ Set(
|
||||||
|
RecapFeatures.HAS_VINE,
|
||||||
|
RecapFeatures.HAS_PERISCOPE,
|
||||||
|
RecapFeatures.HAS_PRO_VIDEO,
|
||||||
|
RecapFeatures.HAS_VISIBLE_LINK,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_FAV_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_FAV_COUNT,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_REPLY_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_REPLY_COUNT,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_RETWEET_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_RETWEET_COUNT,
|
||||||
|
RecapFeatures.EMBEDS_URL_COUNT,
|
||||||
|
RecapFeatures.EMBEDS_IMPRESSION_COUNT,
|
||||||
|
RecapFeatures.VIDEO_VIEW_COUNT,
|
||||||
|
RecapFeatures.IS_RETWEET,
|
||||||
|
RecapFeatures.IS_REPLY,
|
||||||
|
RecapFeatures.IS_EXTENDED_REPLY,
|
||||||
|
RecapFeatures.HAS_LINK,
|
||||||
|
RecapFeatures.HAS_TREND,
|
||||||
|
RecapFeatures.LINK_LANGUAGE,
|
||||||
|
RecapFeatures.NUM_HASHTAGS,
|
||||||
|
RecapFeatures.NUM_MENTIONS,
|
||||||
|
RecapFeatures.IS_SENSITIVE,
|
||||||
|
RecapFeatures.HAS_MULTIPLE_MEDIA,
|
||||||
|
RecapFeatures.USER_REP,
|
||||||
|
RecapFeatures.FAV_COUNT_V2,
|
||||||
|
RecapFeatures.RETWEET_COUNT_V2,
|
||||||
|
RecapFeatures.REPLY_COUNT_V2,
|
||||||
|
RecapFeatures.LINK_COUNT,
|
||||||
|
EngagementDataRecordFeatures.InNetworkFavoritesCount,
|
||||||
|
EngagementDataRecordFeatures.InNetworkRetweetsCount,
|
||||||
|
EngagementDataRecordFeatures.InNetworkRepliesCount
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorFeaturesV2: Set[Feature[_]] = Set(
|
||||||
|
RecapFeatures.HAS_IMAGE,
|
||||||
|
RecapFeatures.HAS_VINE,
|
||||||
|
RecapFeatures.HAS_PERISCOPE,
|
||||||
|
RecapFeatures.HAS_PRO_VIDEO,
|
||||||
|
RecapFeatures.HAS_VIDEO,
|
||||||
|
RecapFeatures.HAS_CARD,
|
||||||
|
RecapFeatures.HAS_NEWS,
|
||||||
|
RecapFeatures.HAS_VISIBLE_LINK,
|
||||||
|
RecapFeatures.REPLY_COUNT,
|
||||||
|
RecapFeatures.FAV_COUNT,
|
||||||
|
RecapFeatures.RETWEET_COUNT,
|
||||||
|
RecapFeatures.BLENDER_SCORE,
|
||||||
|
RecapFeatures.CONVERSATIONAL_COUNT,
|
||||||
|
RecapFeatures.IS_BUSINESS_SCORE,
|
||||||
|
RecapFeatures.CONTAINS_MEDIA,
|
||||||
|
RecapFeatures.RETWEET_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_SEARCHER,
|
||||||
|
RecapFeatures.MENTION_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_OTHER,
|
||||||
|
RecapFeatures.RETWEET_OTHER,
|
||||||
|
RecapFeatures.MATCH_UI_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_MAIN_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_LANGS,
|
||||||
|
RecapFeatures.TWEET_COUNT_FROM_USER_IN_SNAPSHOT,
|
||||||
|
RecapFeatures.TEXT_SCORE,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_FAV_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_FAV_COUNT,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_REPLY_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_REPLY_COUNT,
|
||||||
|
RecapFeatures.BIDIRECTIONAL_RETWEET_COUNT,
|
||||||
|
RecapFeatures.UNIDIRECTIONAL_RETWEET_COUNT,
|
||||||
|
RecapFeatures.EMBEDS_URL_COUNT,
|
||||||
|
RecapFeatures.EMBEDS_IMPRESSION_COUNT,
|
||||||
|
RecapFeatures.VIDEO_VIEW_COUNT,
|
||||||
|
RecapFeatures.IS_RETWEET,
|
||||||
|
RecapFeatures.IS_REPLY,
|
||||||
|
RecapFeatures.HAS_LINK,
|
||||||
|
RecapFeatures.HAS_TREND,
|
||||||
|
RecapFeatures.LINK_LANGUAGE,
|
||||||
|
RecapFeatures.NUM_HASHTAGS,
|
||||||
|
RecapFeatures.NUM_MENTIONS,
|
||||||
|
RecapFeatures.IS_SENSITIVE,
|
||||||
|
RecapFeatures.HAS_MULTIPLE_MEDIA,
|
||||||
|
RecapFeatures.FAV_COUNT_V2,
|
||||||
|
RecapFeatures.RETWEET_COUNT_V2,
|
||||||
|
RecapFeatures.REPLY_COUNT_V2,
|
||||||
|
RecapFeatures.LINK_COUNT,
|
||||||
|
EngagementDataRecordFeatures.InNetworkFavoritesCount,
|
||||||
|
EngagementDataRecordFeatures.InNetworkRetweetsCount,
|
||||||
|
EngagementDataRecordFeatures.InNetworkRepliesCount
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorFeaturesV2Count: Set[Feature[_]] = Set(
|
||||||
|
RecapFeatures.HAS_IMAGE,
|
||||||
|
RecapFeatures.HAS_VINE,
|
||||||
|
RecapFeatures.HAS_PERISCOPE,
|
||||||
|
RecapFeatures.HAS_PRO_VIDEO,
|
||||||
|
RecapFeatures.HAS_VIDEO,
|
||||||
|
RecapFeatures.HAS_CARD,
|
||||||
|
RecapFeatures.HAS_NEWS,
|
||||||
|
RecapFeatures.HAS_VISIBLE_LINK,
|
||||||
|
RecapFeatures.FAV_COUNT,
|
||||||
|
RecapFeatures.CONTAINS_MEDIA,
|
||||||
|
RecapFeatures.RETWEET_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_SEARCHER,
|
||||||
|
RecapFeatures.MENTION_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_OTHER,
|
||||||
|
RecapFeatures.RETWEET_OTHER,
|
||||||
|
RecapFeatures.MATCH_UI_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_MAIN_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_LANGS,
|
||||||
|
RecapFeatures.IS_RETWEET,
|
||||||
|
RecapFeatures.IS_REPLY,
|
||||||
|
RecapFeatures.HAS_LINK,
|
||||||
|
RecapFeatures.HAS_TREND,
|
||||||
|
RecapFeatures.IS_SENSITIVE,
|
||||||
|
RecapFeatures.HAS_MULTIPLE_MEDIA,
|
||||||
|
EngagementDataRecordFeatures.InNetworkFavoritesCount
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserTopicFeaturesV2Count: Set[Feature[_]] = Set(
|
||||||
|
RecapFeatures.HAS_IMAGE,
|
||||||
|
RecapFeatures.HAS_VIDEO,
|
||||||
|
RecapFeatures.HAS_CARD,
|
||||||
|
RecapFeatures.HAS_NEWS,
|
||||||
|
RecapFeatures.FAV_COUNT,
|
||||||
|
RecapFeatures.CONTAINS_MEDIA,
|
||||||
|
RecapFeatures.RETWEET_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_SEARCHER,
|
||||||
|
RecapFeatures.MENTION_SEARCHER,
|
||||||
|
RecapFeatures.REPLY_OTHER,
|
||||||
|
RecapFeatures.RETWEET_OTHER,
|
||||||
|
RecapFeatures.MATCH_UI_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_MAIN_LANG,
|
||||||
|
RecapFeatures.MATCH_SEARCHER_LANGS,
|
||||||
|
RecapFeatures.IS_RETWEET,
|
||||||
|
RecapFeatures.IS_REPLY,
|
||||||
|
RecapFeatures.HAS_LINK,
|
||||||
|
RecapFeatures.HAS_TREND,
|
||||||
|
RecapFeatures.IS_SENSITIVE,
|
||||||
|
EngagementDataRecordFeatures.InNetworkFavoritesCount,
|
||||||
|
EngagementDataRecordFeatures.InNetworkRetweetsCount,
|
||||||
|
TimelinesSharedFeatures.NUM_CAPS,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_DEN,
|
||||||
|
TimelinesSharedFeatures.NUM_NEWLINES,
|
||||||
|
TimelinesSharedFeatures.IS_360,
|
||||||
|
TimelinesSharedFeatures.IS_MANAGED,
|
||||||
|
TimelinesSharedFeatures.IS_MONETIZABLE,
|
||||||
|
TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE,
|
||||||
|
TimelinesSharedFeatures.HAS_TITLE,
|
||||||
|
TimelinesSharedFeatures.HAS_DESCRIPTION,
|
||||||
|
TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION,
|
||||||
|
TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserFeaturesV5Continuous: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.QUOTE_COUNT,
|
||||||
|
TimelinesSharedFeatures.VISIBLE_TOKEN_RATIO,
|
||||||
|
TimelinesSharedFeatures.WEIGHTED_FAV_COUNT,
|
||||||
|
TimelinesSharedFeatures.WEIGHTED_RETWEET_COUNT,
|
||||||
|
TimelinesSharedFeatures.WEIGHTED_REPLY_COUNT,
|
||||||
|
TimelinesSharedFeatures.WEIGHTED_QUOTE_COUNT,
|
||||||
|
TimelinesSharedFeatures.EMBEDS_IMPRESSION_COUNT_V2,
|
||||||
|
TimelinesSharedFeatures.EMBEDS_URL_COUNT_V2,
|
||||||
|
TimelinesSharedFeatures.DECAYED_FAVORITE_COUNT,
|
||||||
|
TimelinesSharedFeatures.DECAYED_RETWEET_COUNT,
|
||||||
|
TimelinesSharedFeatures.DECAYED_REPLY_COUNT,
|
||||||
|
TimelinesSharedFeatures.DECAYED_QUOTE_COUNT,
|
||||||
|
TimelinesSharedFeatures.FAKE_FAVORITE_COUNT,
|
||||||
|
TimelinesSharedFeatures.FAKE_RETWEET_COUNT,
|
||||||
|
TimelinesSharedFeatures.FAKE_REPLY_COUNT,
|
||||||
|
TimelinesSharedFeatures.FAKE_QUOTE_COUNT,
|
||||||
|
TimeDataRecordFeatures.LAST_FAVORITE_SINCE_CREATION_HRS,
|
||||||
|
TimeDataRecordFeatures.LAST_RETWEET_SINCE_CREATION_HRS,
|
||||||
|
TimeDataRecordFeatures.LAST_REPLY_SINCE_CREATION_HRS,
|
||||||
|
TimeDataRecordFeatures.LAST_QUOTE_SINCE_CREATION_HRS,
|
||||||
|
TimeDataRecordFeatures.TIME_SINCE_LAST_FAVORITE_HRS,
|
||||||
|
TimeDataRecordFeatures.TIME_SINCE_LAST_RETWEET_HRS,
|
||||||
|
TimeDataRecordFeatures.TIME_SINCE_LAST_REPLY_HRS,
|
||||||
|
TimeDataRecordFeatures.TIME_SINCE_LAST_QUOTE_HRS
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserFeaturesV5Boolean: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.LABEL_ABUSIVE_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_ABUSIVE_HI_RCL_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_DUP_CONTENT_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_NSFW_HI_PRC_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_NSFW_HI_RCL_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_SPAM_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_SPAM_HI_RCL_FLAG,
|
||||||
|
TimelinesSharedFeatures.PERISCOPE_EXISTS,
|
||||||
|
TimelinesSharedFeatures.PERISCOPE_IS_LIVE,
|
||||||
|
TimelinesSharedFeatures.PERISCOPE_HAS_BEEN_FEATURED,
|
||||||
|
TimelinesSharedFeatures.PERISCOPE_IS_CURRENTLY_FEATURED,
|
||||||
|
TimelinesSharedFeatures.PERISCOPE_IS_FROM_QUALITY_SOURCE,
|
||||||
|
TimelinesSharedFeatures.HAS_QUOTE
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorFeaturesV5: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.HAS_QUOTE,
|
||||||
|
TimelinesSharedFeatures.LABEL_ABUSIVE_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_ABUSIVE_HI_RCL_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_DUP_CONTENT_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_NSFW_HI_PRC_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_NSFW_HI_RCL_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_SPAM_FLAG,
|
||||||
|
TimelinesSharedFeatures.LABEL_SPAM_HI_RCL_FLAG
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserTweetSourceFeaturesV1Continuous: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.NUM_CAPS,
|
||||||
|
TimelinesSharedFeatures.NUM_WHITESPACES,
|
||||||
|
TimelinesSharedFeatures.TWEET_LENGTH,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_DEN,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_NUM,
|
||||||
|
TimelinesSharedFeatures.BIT_RATE,
|
||||||
|
TimelinesSharedFeatures.HEIGHT_1,
|
||||||
|
TimelinesSharedFeatures.HEIGHT_2,
|
||||||
|
TimelinesSharedFeatures.HEIGHT_3,
|
||||||
|
TimelinesSharedFeatures.HEIGHT_4,
|
||||||
|
TimelinesSharedFeatures.VIDEO_DURATION,
|
||||||
|
TimelinesSharedFeatures.WIDTH_1,
|
||||||
|
TimelinesSharedFeatures.WIDTH_2,
|
||||||
|
TimelinesSharedFeatures.WIDTH_3,
|
||||||
|
TimelinesSharedFeatures.WIDTH_4,
|
||||||
|
TimelinesSharedFeatures.NUM_MEDIA_TAGS
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserTweetSourceFeaturesV1Boolean: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.HAS_QUESTION,
|
||||||
|
TimelinesSharedFeatures.RESIZE_METHOD_1,
|
||||||
|
TimelinesSharedFeatures.RESIZE_METHOD_2,
|
||||||
|
TimelinesSharedFeatures.RESIZE_METHOD_3,
|
||||||
|
TimelinesSharedFeatures.RESIZE_METHOD_4
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserTweetSourceFeaturesV2Continuous: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.NUM_EMOJIS,
|
||||||
|
TimelinesSharedFeatures.NUM_EMOTICONS,
|
||||||
|
TimelinesSharedFeatures.NUM_NEWLINES,
|
||||||
|
TimelinesSharedFeatures.NUM_STICKERS,
|
||||||
|
TimelinesSharedFeatures.NUM_FACES,
|
||||||
|
TimelinesSharedFeatures.NUM_COLOR_PALLETTE_ITEMS,
|
||||||
|
TimelinesSharedFeatures.VIEW_COUNT,
|
||||||
|
TimelinesSharedFeatures.TWEET_LENGTH_TYPE
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserTweetSourceFeaturesV2Boolean: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.IS_360,
|
||||||
|
TimelinesSharedFeatures.IS_MANAGED,
|
||||||
|
TimelinesSharedFeatures.IS_MONETIZABLE,
|
||||||
|
TimelinesSharedFeatures.IS_EMBEDDABLE,
|
||||||
|
TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE,
|
||||||
|
TimelinesSharedFeatures.HAS_TITLE,
|
||||||
|
TimelinesSharedFeatures.HAS_DESCRIPTION,
|
||||||
|
TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION,
|
||||||
|
TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorTweetSourceFeaturesV1: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.HAS_QUESTION,
|
||||||
|
TimelinesSharedFeatures.TWEET_LENGTH,
|
||||||
|
TimelinesSharedFeatures.VIDEO_DURATION,
|
||||||
|
TimelinesSharedFeatures.NUM_MEDIA_TAGS
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorTweetSourceFeaturesV2: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.NUM_CAPS,
|
||||||
|
TimelinesSharedFeatures.NUM_WHITESPACES,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_DEN,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_NUM,
|
||||||
|
TimelinesSharedFeatures.BIT_RATE,
|
||||||
|
TimelinesSharedFeatures.TWEET_LENGTH_TYPE,
|
||||||
|
TimelinesSharedFeatures.NUM_EMOJIS,
|
||||||
|
TimelinesSharedFeatures.NUM_EMOTICONS,
|
||||||
|
TimelinesSharedFeatures.NUM_NEWLINES,
|
||||||
|
TimelinesSharedFeatures.NUM_STICKERS,
|
||||||
|
TimelinesSharedFeatures.NUM_FACES,
|
||||||
|
TimelinesSharedFeatures.IS_360,
|
||||||
|
TimelinesSharedFeatures.IS_MANAGED,
|
||||||
|
TimelinesSharedFeatures.IS_MONETIZABLE,
|
||||||
|
TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE,
|
||||||
|
TimelinesSharedFeatures.HAS_TITLE,
|
||||||
|
TimelinesSharedFeatures.HAS_DESCRIPTION,
|
||||||
|
TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION,
|
||||||
|
TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserAuthorTweetSourceFeaturesV2Count: Set[Feature[_]] = Set(
|
||||||
|
TimelinesSharedFeatures.NUM_CAPS,
|
||||||
|
TimelinesSharedFeatures.ASPECT_RATIO_DEN,
|
||||||
|
TimelinesSharedFeatures.NUM_NEWLINES,
|
||||||
|
TimelinesSharedFeatures.IS_360,
|
||||||
|
TimelinesSharedFeatures.IS_MANAGED,
|
||||||
|
TimelinesSharedFeatures.IS_MONETIZABLE,
|
||||||
|
TimelinesSharedFeatures.HAS_SELECTED_PREVIEW_IMAGE,
|
||||||
|
TimelinesSharedFeatures.HAS_TITLE,
|
||||||
|
TimelinesSharedFeatures.HAS_DESCRIPTION,
|
||||||
|
TimelinesSharedFeatures.HAS_VISIT_SITE_CALL_TO_ACTION,
|
||||||
|
TimelinesSharedFeatures.HAS_WATCH_NOW_CALL_TO_ACTION
|
||||||
|
)
|
||||||
|
|
||||||
|
val LabelsV2: Set[Feature.Binary] = RecapLabelsForAggregation ++ Set(
|
||||||
|
RecapFeatures.IS_REPLIED,
|
||||||
|
RecapFeatures.IS_PHOTO_EXPANDED,
|
||||||
|
RecapFeatures.IS_VIDEO_PLAYBACK_50
|
||||||
|
)
|
||||||
|
|
||||||
|
val TwitterWideFeatures: Set[Feature[_]] = Set(
|
||||||
|
RecapFeatures.IS_REPLY,
|
||||||
|
TimelinesSharedFeatures.HAS_QUOTE,
|
||||||
|
RecapFeatures.HAS_MENTION,
|
||||||
|
RecapFeatures.HAS_HASHTAG,
|
||||||
|
RecapFeatures.HAS_LINK,
|
||||||
|
RecapFeatures.HAS_CARD,
|
||||||
|
RecapFeatures.CONTAINS_MEDIA
|
||||||
|
)
|
||||||
|
|
||||||
|
val TwitterWideLabels: Set[Feature.Binary] = Set(
|
||||||
|
RecapFeatures.IS_FAVORITED,
|
||||||
|
RecapFeatures.IS_RETWEETED,
|
||||||
|
RecapFeatures.IS_REPLIED
|
||||||
|
)
|
||||||
|
|
||||||
|
val ReciprocalLabels: Set[Feature.Binary] = Set(
|
||||||
|
RecapFeatures.IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR,
|
||||||
|
RecapFeatures.IS_REPLIED_REPLY_REPLIED_BY_AUTHOR,
|
||||||
|
RecapFeatures.IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR
|
||||||
|
)
|
||||||
|
|
||||||
|
val NegativeEngagementLabels: Set[Feature.Binary] = Set(
|
||||||
|
RecapFeatures.IS_REPORT_TWEET_CLICKED,
|
||||||
|
RecapFeatures.IS_BLOCK_CLICKED,
|
||||||
|
RecapFeatures.IS_MUTE_CLICKED,
|
||||||
|
RecapFeatures.IS_DONT_LIKE
|
||||||
|
)
|
||||||
|
|
||||||
|
val GoodClickLabels: Set[Feature.Binary] = Set(
|
||||||
|
RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V1,
|
||||||
|
RecapFeatures.IS_GOOD_CLICKED_CONVO_DESC_V2,
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,52 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.Feature
|
||||||
|
import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.itl.ITLFeatures
|
||||||
|
|
||||||
|
object RectweetUserFeatureAggregation {
|
||||||
|
val RectweetLabelsForAggregation: Set[Feature.Binary] =
|
||||||
|
Set(
|
||||||
|
ITLFeatures.IS_FAVORITED,
|
||||||
|
ITLFeatures.IS_RETWEETED,
|
||||||
|
ITLFeatures.IS_REPLIED,
|
||||||
|
ITLFeatures.IS_CLICKED,
|
||||||
|
ITLFeatures.IS_PROFILE_CLICKED,
|
||||||
|
ITLFeatures.IS_OPEN_LINKED,
|
||||||
|
ITLFeatures.IS_PHOTO_EXPANDED,
|
||||||
|
ITLFeatures.IS_VIDEO_PLAYBACK_50
|
||||||
|
)
|
||||||
|
|
||||||
|
val TweetFeatures: Set[Feature[_]] = Set(
|
||||||
|
ITLFeatures.HAS_IMAGE,
|
||||||
|
ITLFeatures.HAS_CARD,
|
||||||
|
ITLFeatures.HAS_NEWS,
|
||||||
|
ITLFeatures.REPLY_COUNT,
|
||||||
|
ITLFeatures.FAV_COUNT,
|
||||||
|
ITLFeatures.REPLY_COUNT,
|
||||||
|
ITLFeatures.RETWEET_COUNT,
|
||||||
|
ITLFeatures.MATCHES_UI_LANG,
|
||||||
|
ITLFeatures.MATCHES_SEARCHER_MAIN_LANG,
|
||||||
|
ITLFeatures.MATCHES_SEARCHER_LANGS,
|
||||||
|
ITLFeatures.TEXT_SCORE,
|
||||||
|
ITLFeatures.LINK_LANGUAGE,
|
||||||
|
ITLFeatures.NUM_HASHTAGS,
|
||||||
|
ITLFeatures.NUM_MENTIONS,
|
||||||
|
ITLFeatures.IS_SENSITIVE,
|
||||||
|
ITLFeatures.HAS_VIDEO,
|
||||||
|
ITLFeatures.HAS_LINK,
|
||||||
|
ITLFeatures.HAS_VISIBLE_LINK,
|
||||||
|
EngagementDataRecordFeatures.InNetworkFavoritesCount
|
||||||
|
// nice to have, but currently not hydrated in the RecommendedTweet payload
|
||||||
|
//EngagementDataRecordFeatures.InNetworkRetweetsCount,
|
||||||
|
//EngagementDataRecordFeatures.InNetworkRepliesCount
|
||||||
|
)
|
||||||
|
|
||||||
|
val ReciprocalLabels: Set[Feature.Binary] = Set(
|
||||||
|
ITLFeatures.IS_REPLIED_REPLY_IMPRESSED_BY_AUTHOR,
|
||||||
|
ITLFeatures.IS_REPLIED_REPLY_REPLIED_BY_AUTHOR,
|
||||||
|
ITLFeatures.IS_REPLIED_REPLY_FAVORITED_BY_AUTHOR,
|
||||||
|
ITLFeatures.IS_REPLIED_REPLY_RETWEETED_BY_AUTHOR,
|
||||||
|
ITLFeatures.IS_REPLIED_REPLY_QUOTED_BY_AUTHOR
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,80 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.dal.client.dataset.KeyValDALDataset
|
||||||
|
import com.twitter.ml.api.DataRecord
|
||||||
|
import com.twitter.ml.api.FeatureContext
|
||||||
|
import com.twitter.scalding_internal.multiformat.format.keyval
|
||||||
|
import com.twitter.summingbird.batch.BatchID
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.conversion.CombineCountsPolicy
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateStore
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationKey
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.OfflineAggregateDataRecordStore
|
||||||
|
import scala.collection.JavaConverters._
|
||||||
|
|
||||||
|
object TimelinesAggregationConfig extends TimelinesAggregationConfigTrait {
|
||||||
|
override def outputHdfsPath: String = "/user/timelines/processed/aggregates_v2"
|
||||||
|
|
||||||
|
def storeToDatasetMap: Map[String, KeyValDALDataset[
|
||||||
|
keyval.KeyVal[AggregationKey, (BatchID, DataRecord)]
|
||||||
|
]] = Map(
|
||||||
|
AuthorTopicAggregateStore -> AuthorTopicAggregatesScalaDataset,
|
||||||
|
UserTopicAggregateStore -> UserTopicAggregatesScalaDataset,
|
||||||
|
UserInferredTopicAggregateStore -> UserInferredTopicAggregatesScalaDataset,
|
||||||
|
UserAggregateStore -> UserAggregatesScalaDataset,
|
||||||
|
UserAuthorAggregateStore -> UserAuthorAggregatesScalaDataset,
|
||||||
|
UserOriginalAuthorAggregateStore -> UserOriginalAuthorAggregatesScalaDataset,
|
||||||
|
OriginalAuthorAggregateStore -> OriginalAuthorAggregatesScalaDataset,
|
||||||
|
UserEngagerAggregateStore -> UserEngagerAggregatesScalaDataset,
|
||||||
|
UserMentionAggregateStore -> UserMentionAggregatesScalaDataset,
|
||||||
|
TwitterWideUserAggregateStore -> TwitterWideUserAggregatesScalaDataset,
|
||||||
|
TwitterWideUserAuthorAggregateStore -> TwitterWideUserAuthorAggregatesScalaDataset,
|
||||||
|
UserRequestHourAggregateStore -> UserRequestHourAggregatesScalaDataset,
|
||||||
|
UserRequestDowAggregateStore -> UserRequestDowAggregatesScalaDataset,
|
||||||
|
UserListAggregateStore -> UserListAggregatesScalaDataset,
|
||||||
|
UserMediaUnderstandingAnnotationAggregateStore -> UserMediaUnderstandingAnnotationAggregatesScalaDataset,
|
||||||
|
)
|
||||||
|
|
||||||
|
override def mkPhysicalStore(store: AggregateStore): AggregateStore = store match {
|
||||||
|
case s: OfflineAggregateDataRecordStore =>
|
||||||
|
s.toOfflineAggregateDataRecordStoreWithDAL(storeToDatasetMap(s.name))
|
||||||
|
case _ => throw new IllegalArgumentException("Unsupported logical dataset type.")
|
||||||
|
}
|
||||||
|
|
||||||
|
object CombineCountPolicies {
|
||||||
|
val EngagerCountsPolicy: CombineCountsPolicy = mkCountsPolicy("user_engager_aggregate")
|
||||||
|
val EngagerGoodClickCountsPolicy: CombineCountsPolicy = mkCountsPolicy(
|
||||||
|
"user_engager_good_click_aggregate")
|
||||||
|
val RectweetEngagerCountsPolicy: CombineCountsPolicy =
|
||||||
|
mkCountsPolicy("rectweet_user_engager_aggregate")
|
||||||
|
val MentionCountsPolicy: CombineCountsPolicy = mkCountsPolicy("user_mention_aggregate")
|
||||||
|
val RectweetSimclustersTweetCountsPolicy: CombineCountsPolicy =
|
||||||
|
mkCountsPolicy("rectweet_user_simcluster_tweet_aggregate")
|
||||||
|
val UserInferredTopicCountsPolicy: CombineCountsPolicy =
|
||||||
|
mkCountsPolicy("user_inferred_topic_aggregate")
|
||||||
|
val UserInferredTopicV2CountsPolicy: CombineCountsPolicy =
|
||||||
|
mkCountsPolicy("user_inferred_topic_aggregate_v2")
|
||||||
|
val UserMediaUnderstandingAnnotationCountsPolicy: CombineCountsPolicy =
|
||||||
|
mkCountsPolicy("user_media_annotation_aggregate")
|
||||||
|
|
||||||
|
private[this] def mkCountsPolicy(prefix: String): CombineCountsPolicy = {
|
||||||
|
val features = TimelinesAggregationConfig.aggregatesToCompute
|
||||||
|
.filter(_.aggregatePrefix == prefix)
|
||||||
|
.flatMap(_.allOutputFeatures)
|
||||||
|
CombineCountsPolicy(
|
||||||
|
topK = 2,
|
||||||
|
aggregateContextToPrecompute = new FeatureContext(features.asJava),
|
||||||
|
hardLimit = Some(20)
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
object TimelinesAggregationCanaryConfig extends TimelinesAggregationConfigTrait {
|
||||||
|
override def outputHdfsPath: String = "/user/timelines/canaries/processed/aggregates_v2"
|
||||||
|
|
||||||
|
override def mkPhysicalStore(store: AggregateStore): AggregateStore = store match {
|
||||||
|
case s: OfflineAggregateDataRecordStore =>
|
||||||
|
s.toOfflineAggregateDataRecordStoreWithDAL(dalDataset = AggregatesCanaryScalaDataset)
|
||||||
|
case _ => throw new IllegalArgumentException("Unsupported logical dataset type.")
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,579 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.conversions.DurationOps._
|
||||||
|
import com.twitter.ml.api.constant.SharedFeatures.AUTHOR_ID
|
||||||
|
import com.twitter.ml.api.constant.SharedFeatures.USER_ID
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework._
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.metrics._
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.transforms.DownsampleTransform
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.transforms.RichRemoveAuthorIdZero
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.transforms.RichRemoveUserIdZero
|
||||||
|
import com.twitter.timelines.prediction.features.common.TimelinesSharedFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.engagement_features.EngagementDataRecordFeatures.RichUnifyPublicEngagersTransform
|
||||||
|
import com.twitter.timelines.prediction.features.list_features.ListFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.recap.RecapFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.request_context.RequestContextFeatures
|
||||||
|
import com.twitter.timelines.prediction.features.semantic_core_features.SemanticCoreFeatures
|
||||||
|
import com.twitter.timelines.prediction.transform.filter.FilterInNetworkTransform
|
||||||
|
import com.twitter.timelines.prediction.transform.filter.FilterImageTweetTransform
|
||||||
|
import com.twitter.timelines.prediction.transform.filter.FilterVideoTweetTransform
|
||||||
|
import com.twitter.timelines.prediction.transform.filter.FilterOutImageVideoTweetTransform
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
|
||||||
|
trait TimelinesAggregationConfigDetails extends Serializable {
|
||||||
|
|
||||||
|
import TimelinesAggregationSources._
|
||||||
|
|
||||||
|
def outputHdfsPath: String
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Converts the given logical store to a physical store. The reason we do not specify the
|
||||||
|
* physical store directly with the [[AggregateGroup]] is because of a cyclic dependency when
|
||||||
|
* create physical stores that are DalDataset with PersonalDataType annotations derived from
|
||||||
|
* the [[AggregateGroup]].
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
def mkPhysicalStore(store: AggregateStore): AggregateStore
|
||||||
|
|
||||||
|
def defaultMaxKvSourceFailures: Int = 100
|
||||||
|
|
||||||
|
val timelinesOfflineAggregateSink = new OfflineStoreCommonConfig {
|
||||||
|
override def apply(startDate: String) = OfflineAggregateStoreCommonConfig(
|
||||||
|
outputHdfsPathPrefix = outputHdfsPath,
|
||||||
|
dummyAppId = "timelines_aggregates_v2_ro",
|
||||||
|
dummyDatasetPrefix = "timelines_aggregates_v2_ro",
|
||||||
|
startDate = startDate
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
val UserAggregateStore = "user_aggregates"
|
||||||
|
val UserAuthorAggregateStore = "user_author_aggregates"
|
||||||
|
val UserOriginalAuthorAggregateStore = "user_original_author_aggregates"
|
||||||
|
val OriginalAuthorAggregateStore = "original_author_aggregates"
|
||||||
|
val UserEngagerAggregateStore = "user_engager_aggregates"
|
||||||
|
val UserMentionAggregateStore = "user_mention_aggregates"
|
||||||
|
val TwitterWideUserAggregateStore = "twitter_wide_user_aggregates"
|
||||||
|
val TwitterWideUserAuthorAggregateStore = "twitter_wide_user_author_aggregates"
|
||||||
|
val UserRequestHourAggregateStore = "user_request_hour_aggregates"
|
||||||
|
val UserRequestDowAggregateStore = "user_request_dow_aggregates"
|
||||||
|
val UserListAggregateStore = "user_list_aggregates"
|
||||||
|
val AuthorTopicAggregateStore = "author_topic_aggregates"
|
||||||
|
val UserTopicAggregateStore = "user_topic_aggregates"
|
||||||
|
val UserInferredTopicAggregateStore = "user_inferred_topic_aggregates"
|
||||||
|
val UserMediaUnderstandingAnnotationAggregateStore =
|
||||||
|
"user_media_understanding_annotation_aggregates"
|
||||||
|
val AuthorCountryCodeAggregateStore = "author_country_code_aggregates"
|
||||||
|
val OriginalAuthorCountryCodeAggregateStore = "original_author_country_code_aggregates"
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Step 3: Configure all aggregates to compute.
|
||||||
|
* Note that different subsets of aggregates in this list
|
||||||
|
* can be launched by different summingbird job instances.
|
||||||
|
* Any given job can be responsible for a set of AggregateGroup
|
||||||
|
* configs whose outputStores share the same exact startDate.
|
||||||
|
* AggregateGroups that do not share the same inputSource,
|
||||||
|
* outputStore or startDate MUST be launched using different
|
||||||
|
* summingbird jobs and passed in a different --start-time argument
|
||||||
|
* See science/scalding/mesos/timelines/prod.yaml for an example
|
||||||
|
* of how to configure your own job.
|
||||||
|
*/
|
||||||
|
val negativeDownsampleTransform =
|
||||||
|
DownsampleTransform(
|
||||||
|
negativeSamplingRate = 0.03,
|
||||||
|
keepLabels = RecapUserFeatureAggregation.LabelsV2)
|
||||||
|
val negativeRecTweetDownsampleTransform = DownsampleTransform(
|
||||||
|
negativeSamplingRate = 0.03,
|
||||||
|
keepLabels = RectweetUserFeatureAggregation.RectweetLabelsForAggregation
|
||||||
|
)
|
||||||
|
|
||||||
|
val userAggregatesV2: AggregateGroup =
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_aggregate_v2",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */
|
||||||
|
keys = Set(USER_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserFeaturesV2,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric, SumMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userAuthorAggregatesV2: Set[AggregateGroup] = {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* NOTE: We need to remove records from out-of-network authors from the recap input
|
||||||
|
* records (which now include out-of-network records as well after merging recap and
|
||||||
|
* rectweet models) that are used to compute user-author aggregates. This is necessary
|
||||||
|
* to limit the growth rate of user-author aggregates.
|
||||||
|
*/
|
||||||
|
val allFeatureAggregates = Set(
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_author_aggregate_v2",
|
||||||
|
preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserAuthorFeaturesV2,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(SumMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAuthorAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
val countAggregates: Set[AggregateGroup] = Set(
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_author_aggregate_v2",
|
||||||
|
preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserAuthorFeaturesV2Count,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAuthorAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
allFeatureAggregates ++ countAggregates
|
||||||
|
}
|
||||||
|
|
||||||
|
val userAggregatesV5Continuous: AggregateGroup =
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_aggregate_v5.continuous",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserFeaturesV5Continuous,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric, SumMetric, SumSqMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userAuthorAggregatesV5: AggregateGroup =
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_author_aggregate_v5",
|
||||||
|
preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserAuthorFeaturesV5,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAuthorAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val tweetSourceUserAuthorAggregatesV1: AggregateGroup =
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_author_aggregate_tweetsource_v1",
|
||||||
|
preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserAuthorTweetSourceFeaturesV1,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric, SumMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAuthorAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userEngagerAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_engager_aggregate",
|
||||||
|
keys = Set(USER_ID, EngagementDataRecordFeatures.PublicEngagementUserIds),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserEngagerAggregateStore,
|
||||||
|
startDate = "2016-09-02 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
preTransforms = Seq(
|
||||||
|
RichRemoveUserIdZero,
|
||||||
|
RichUnifyPublicEngagersTransform
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
val userMentionAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */
|
||||||
|
aggregatePrefix = "user_mention_aggregate",
|
||||||
|
keys = Set(USER_ID, RecapFeatures.MENTIONED_SCREEN_NAMES),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserMentionAggregateStore,
|
||||||
|
startDate = "2017-03-01 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
val twitterWideUserAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyTwitterWideSource,
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */
|
||||||
|
aggregatePrefix = "twitter_wide_user_aggregate",
|
||||||
|
keys = Set(USER_ID),
|
||||||
|
features = RecapUserFeatureAggregation.TwitterWideFeatures,
|
||||||
|
labels = RecapUserFeatureAggregation.TwitterWideLabels,
|
||||||
|
metrics = Set(CountMetric, SumMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = TwitterWideUserAggregateStore,
|
||||||
|
startDate = "2016-12-28 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val twitterWideUserAuthorAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyTwitterWideSource,
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero), /* Eliminates reducer skew */
|
||||||
|
aggregatePrefix = "twitter_wide_user_author_aggregate",
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.TwitterWideFeatures,
|
||||||
|
labels = RecapUserFeatureAggregation.TwitterWideLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = TwitterWideUserAuthorAggregateStore,
|
||||||
|
startDate = "2016-12-28 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* User-HourOfDay and User-DayOfWeek aggregations, both for recap and rectweet
|
||||||
|
*/
|
||||||
|
val userRequestHourAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_request_context_aggregate.hour",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero, negativeDownsampleTransform),
|
||||||
|
keys = Set(USER_ID, RequestContextFeatures.TIMESTAMP_GMT_HOUR),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserRequestHourAggregateStore,
|
||||||
|
startDate = "2017-08-01 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userRequestDowAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_request_context_aggregate.dow",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero, negativeDownsampleTransform),
|
||||||
|
keys = Set(USER_ID, RequestContextFeatures.TIMESTAMP_GMT_DOW),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserRequestDowAggregateStore,
|
||||||
|
startDate = "2017-08-01 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val authorTopicAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "author_topic_aggregate",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(AUTHOR_ID, TimelinesSharedFeatures.TOPIC_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = AuthorTopicAggregateStore,
|
||||||
|
startDate = "2020-05-19 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userTopicAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_topic_aggregate",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, TimelinesSharedFeatures.TOPIC_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserTopicAggregateStore,
|
||||||
|
startDate = "2020-05-23 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userTopicAggregatesV2 = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_topic_aggregate_v2",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, TimelinesSharedFeatures.TOPIC_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserTopicFeaturesV2Count,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
includeAnyFeature = false,
|
||||||
|
includeAnyLabel = false,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserTopicAggregateStore,
|
||||||
|
startDate = "2020-05-23 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userInferredTopicAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_inferred_topic_aggregate",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, TimelinesSharedFeatures.INFERRED_TOPIC_IDS),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserInferredTopicAggregateStore,
|
||||||
|
startDate = "2020-09-09 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userInferredTopicAggregatesV2 = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_inferred_topic_aggregate_v2",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, TimelinesSharedFeatures.INFERRED_TOPIC_IDS),
|
||||||
|
features = RecapUserFeatureAggregation.UserTopicFeaturesV2Count,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
includeAnyFeature = false,
|
||||||
|
includeAnyLabel = false,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserInferredTopicAggregateStore,
|
||||||
|
startDate = "2020-09-09 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userReciprocalEngagementAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_aggregate_v6",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.ReciprocalLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
val userOriginalAuthorReciprocalEngagementAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_original_author_aggregate_v1",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero),
|
||||||
|
keys = Set(USER_ID, TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.ReciprocalLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserOriginalAuthorAggregateStore,
|
||||||
|
startDate = "2018-12-26 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
val originalAuthorReciprocalEngagementAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "original_author_aggregate_v1",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero),
|
||||||
|
keys = Set(TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.ReciprocalLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = OriginalAuthorAggregateStore,
|
||||||
|
startDate = "2023-02-25 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
val originalAuthorNegativeEngagementAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "original_author_aggregate_v2",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero, RichRemoveAuthorIdZero),
|
||||||
|
keys = Set(TimelinesSharedFeatures.ORIGINAL_AUTHOR_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.NegativeEngagementLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = OriginalAuthorAggregateStore,
|
||||||
|
startDate = "2023-02-25 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
includeAnyLabel = false
|
||||||
|
)
|
||||||
|
|
||||||
|
val userListAggregates: AggregateGroup =
|
||||||
|
AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_list_aggregate",
|
||||||
|
keys = Set(USER_ID, ListFeatures.LIST_ID),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserListAggregateStore,
|
||||||
|
startDate = "2020-05-28 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero)
|
||||||
|
)
|
||||||
|
|
||||||
|
val userMediaUnderstandingAnnotationAggregates: AggregateGroup = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_media_annotation_aggregate",
|
||||||
|
preTransforms = Seq(RichRemoveUserIdZero),
|
||||||
|
keys =
|
||||||
|
Set(USER_ID, SemanticCoreFeatures.mediaUnderstandingHighRecallNonSensitiveEntityIdsFeature),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.LabelsV2,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(50.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserMediaUnderstandingAnnotationAggregateStore,
|
||||||
|
startDate = "2021-03-20 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userAuthorGoodClickAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_author_good_click_aggregate",
|
||||||
|
preTransforms = Seq(FilterInNetworkTransform, RichRemoveUserIdZero),
|
||||||
|
keys = Set(USER_ID, AUTHOR_ID),
|
||||||
|
features = RecapUserFeatureAggregation.UserAuthorFeaturesV2,
|
||||||
|
labels = RecapUserFeatureAggregation.GoodClickLabels,
|
||||||
|
metrics = Set(SumMetric),
|
||||||
|
halfLives = Set(14.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserAuthorAggregateStore,
|
||||||
|
startDate = "2016-07-15 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
))
|
||||||
|
)
|
||||||
|
|
||||||
|
val userEngagerGoodClickAggregates = AggregateGroup(
|
||||||
|
inputSource = timelinesDailyRecapMinimalSource,
|
||||||
|
aggregatePrefix = "user_engager_good_click_aggregate",
|
||||||
|
keys = Set(USER_ID, EngagementDataRecordFeatures.PublicEngagementUserIds),
|
||||||
|
features = Set.empty,
|
||||||
|
labels = RecapUserFeatureAggregation.GoodClickLabels,
|
||||||
|
metrics = Set(CountMetric),
|
||||||
|
halfLives = Set(14.days),
|
||||||
|
outputStore = mkPhysicalStore(
|
||||||
|
OfflineAggregateDataRecordStore(
|
||||||
|
name = UserEngagerAggregateStore,
|
||||||
|
startDate = "2016-09-02 00:00",
|
||||||
|
commonConfig = timelinesOfflineAggregateSink,
|
||||||
|
maxKvSourceFailures = defaultMaxKvSourceFailures
|
||||||
|
)),
|
||||||
|
preTransforms = Seq(
|
||||||
|
RichRemoveUserIdZero,
|
||||||
|
RichUnifyPublicEngagersTransform
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
}
|
@ -0,0 +1,50 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregationConfig
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.AggregateGroup
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.TypedAggregateGroup
|
||||||
|
|
||||||
|
trait TimelinesAggregationConfigTrait
|
||||||
|
extends TimelinesAggregationConfigDetails
|
||||||
|
with AggregationConfig {
|
||||||
|
private val aggregateGroups = Set(
|
||||||
|
authorTopicAggregates,
|
||||||
|
userTopicAggregates,
|
||||||
|
userTopicAggregatesV2,
|
||||||
|
userInferredTopicAggregates,
|
||||||
|
userInferredTopicAggregatesV2,
|
||||||
|
userAggregatesV2,
|
||||||
|
userAggregatesV5Continuous,
|
||||||
|
userReciprocalEngagementAggregates,
|
||||||
|
userAuthorAggregatesV5,
|
||||||
|
userOriginalAuthorReciprocalEngagementAggregates,
|
||||||
|
originalAuthorReciprocalEngagementAggregates,
|
||||||
|
tweetSourceUserAuthorAggregatesV1,
|
||||||
|
userEngagerAggregates,
|
||||||
|
userMentionAggregates,
|
||||||
|
twitterWideUserAggregates,
|
||||||
|
twitterWideUserAuthorAggregates,
|
||||||
|
userRequestHourAggregates,
|
||||||
|
userRequestDowAggregates,
|
||||||
|
userListAggregates,
|
||||||
|
userMediaUnderstandingAnnotationAggregates,
|
||||||
|
) ++ userAuthorAggregatesV2
|
||||||
|
|
||||||
|
val aggregatesToComputeList: Set[List[TypedAggregateGroup[_]]] =
|
||||||
|
aggregateGroups.map(_.buildTypedAggregateGroups())
|
||||||
|
|
||||||
|
override val aggregatesToCompute: Set[TypedAggregateGroup[_]] = aggregatesToComputeList.flatten
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Feature selection config to save storage space and manhattan query bandwidth.
|
||||||
|
* Only the most important features found using offline RCE simulations are used
|
||||||
|
* when actually training and serving. This selector is used by
|
||||||
|
* [[com.twitter.timelines.data_processing.jobs.timeline_ranking_user_features.TimelineRankingAggregatesV2FeaturesProdJob]]
|
||||||
|
* but defined here to keep it in sync with the config that computes the aggregates.
|
||||||
|
*/
|
||||||
|
val AggregatesV2FeatureSelector = FeatureSelectorConfig.AggregatesV2ProdFeatureSelector
|
||||||
|
|
||||||
|
def filterAggregatesGroups(storeNames: Set[String]): Set[AggregateGroup] = {
|
||||||
|
aggregateGroups.filter(aggregateGroup => storeNames.contains(aggregateGroup.outputStore.name))
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,48 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.DataRecord
|
||||||
|
import com.twitter.scalding_internal.multiformat.format.keyval.KeyValInjection
|
||||||
|
import com.twitter.summingbird.batch.BatchID
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.{
|
||||||
|
AggregateStore,
|
||||||
|
AggregationKey,
|
||||||
|
OfflineAggregateInjections,
|
||||||
|
TypedAggregateGroup
|
||||||
|
}
|
||||||
|
|
||||||
|
object TimelinesAggregationKeyValInjections extends TimelinesAggregationConfigTrait {
|
||||||
|
|
||||||
|
import OfflineAggregateInjections.getInjection
|
||||||
|
|
||||||
|
type KVInjection = KeyValInjection[AggregationKey, (BatchID, DataRecord)]
|
||||||
|
|
||||||
|
val AuthorTopic: KVInjection = getInjection(filter(AuthorTopicAggregateStore))
|
||||||
|
val UserTopic: KVInjection = getInjection(filter(UserTopicAggregateStore))
|
||||||
|
val UserInferredTopic: KVInjection = getInjection(filter(UserInferredTopicAggregateStore))
|
||||||
|
val User: KVInjection = getInjection(filter(UserAggregateStore))
|
||||||
|
val UserAuthor: KVInjection = getInjection(filter(UserAuthorAggregateStore))
|
||||||
|
val UserOriginalAuthor: KVInjection = getInjection(filter(UserOriginalAuthorAggregateStore))
|
||||||
|
val OriginalAuthor: KVInjection = getInjection(filter(OriginalAuthorAggregateStore))
|
||||||
|
val UserEngager: KVInjection = getInjection(filter(UserEngagerAggregateStore))
|
||||||
|
val UserMention: KVInjection = getInjection(filter(UserMentionAggregateStore))
|
||||||
|
val TwitterWideUser: KVInjection = getInjection(filter(TwitterWideUserAggregateStore))
|
||||||
|
val TwitterWideUserAuthor: KVInjection = getInjection(filter(TwitterWideUserAuthorAggregateStore))
|
||||||
|
val UserRequestHour: KVInjection = getInjection(filter(UserRequestHourAggregateStore))
|
||||||
|
val UserRequestDow: KVInjection = getInjection(filter(UserRequestDowAggregateStore))
|
||||||
|
val UserList: KVInjection = getInjection(filter(UserListAggregateStore))
|
||||||
|
val UserMediaUnderstandingAnnotation: KVInjection = getInjection(
|
||||||
|
filter(UserMediaUnderstandingAnnotationAggregateStore))
|
||||||
|
|
||||||
|
private def filter(storeName: String): Set[TypedAggregateGroup[_]] = {
|
||||||
|
val groups = aggregatesToCompute.filter(_.outputStore.name == storeName)
|
||||||
|
require(groups.nonEmpty)
|
||||||
|
groups
|
||||||
|
}
|
||||||
|
|
||||||
|
override def outputHdfsPath: String = "/user/timelines/processed/aggregates_v2"
|
||||||
|
|
||||||
|
// Since this object is not used to execute any online or offline aggregates job, but is meant
|
||||||
|
// to store all PDT enabled KeyValInjections, we do not need to construct a physical store.
|
||||||
|
// We use the identity operation as a default.
|
||||||
|
override def mkPhysicalStore(store: AggregateStore): AggregateStore = store
|
||||||
|
}
|
@ -0,0 +1,45 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates
|
||||||
|
|
||||||
|
import com.twitter.ml.api.constant.SharedFeatures.TIMESTAMP
|
||||||
|
import com.twitter.timelines.data_processing.ml_util.aggregation_framework.OfflineAggregateSource
|
||||||
|
import com.twitter.timelines.prediction.features.p_home_latest.HomeLatestUserAggregatesFeatures
|
||||||
|
import timelines.data_processing.ad_hoc.recap.data_record_preparation.RecapDataRecordsAggMinimalJavaDataset
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Any update here should be in sync with [[TimelinesFeatureGroups]] and [[AggMinimalDataRecordGeneratorJob]].
|
||||||
|
*/
|
||||||
|
object TimelinesAggregationSources {
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This is the recap data records after post-processing in [[GenerateRecapAggMinimalDataRecordsJob]]
|
||||||
|
*/
|
||||||
|
val timelinesDailyRecapMinimalSource = OfflineAggregateSource(
|
||||||
|
name = "timelines_daily_recap",
|
||||||
|
timestampFeature = TIMESTAMP,
|
||||||
|
dalDataSet = Some(RecapDataRecordsAggMinimalJavaDataset),
|
||||||
|
scaldingSuffixType = Some("dal"),
|
||||||
|
withValidation = true
|
||||||
|
)
|
||||||
|
val timelinesDailyTwitterWideSource = OfflineAggregateSource(
|
||||||
|
name = "timelines_daily_twitter_wide",
|
||||||
|
timestampFeature = TIMESTAMP,
|
||||||
|
scaldingHdfsPath = Some("/user/timelines/processed/suggests/recap/twitter_wide_data_records"),
|
||||||
|
scaldingSuffixType = Some("daily"),
|
||||||
|
withValidation = true
|
||||||
|
)
|
||||||
|
|
||||||
|
val timelinesDailyListTimelineSource = OfflineAggregateSource(
|
||||||
|
name = "timelines_daily_list_timeline",
|
||||||
|
timestampFeature = TIMESTAMP,
|
||||||
|
scaldingHdfsPath = Some("/user/timelines/processed/suggests/recap/all_features/list"),
|
||||||
|
scaldingSuffixType = Some("hourly"),
|
||||||
|
withValidation = true
|
||||||
|
)
|
||||||
|
|
||||||
|
val timelinesDailyHomeLatestSource = OfflineAggregateSource(
|
||||||
|
name = "timelines_daily_home_latest",
|
||||||
|
timestampFeature = HomeLatestUserAggregatesFeatures.AGGREGATE_TIMESTAMP_MS,
|
||||||
|
scaldingHdfsPath = Some("/user/timelines/processed/p_home_latest/user_aggregates"),
|
||||||
|
scaldingSuffixType = Some("daily")
|
||||||
|
)
|
||||||
|
}
|
@ -0,0 +1,70 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates.real_time
|
||||||
|
|
||||||
|
import com.twitter.dal.personal_data.thriftjava.PersonalDataType.UserState
|
||||||
|
import com.twitter.ml.api.Feature.Binary
|
||||||
|
import com.twitter.ml.api.{DataRecord, Feature, FeatureContext, RichDataRecord}
|
||||||
|
import com.twitter.ml.featurestore.catalog.entities.core.Author
|
||||||
|
import com.twitter.ml.featurestore.catalog.features.magicrecs.UserActivity
|
||||||
|
import com.twitter.ml.featurestore.lib.data.PredictionRecord
|
||||||
|
import com.twitter.ml.featurestore.lib.feature.{BoundFeature, BoundFeatureSet}
|
||||||
|
import com.twitter.ml.featurestore.lib.{UserId, Discrete => FSDiscrete}
|
||||||
|
import com.twitter.timelines.prediction.common.adapters.TimelinesAdapterBase
|
||||||
|
import java.lang.{Boolean => JBoolean}
|
||||||
|
import java.util
|
||||||
|
import scala.collection.JavaConverters._
|
||||||
|
|
||||||
|
object AuthorFeaturesAdapter extends TimelinesAdapterBase[PredictionRecord] {
|
||||||
|
val UserStateBoundFeature: BoundFeature[UserId, FSDiscrete] = UserActivity.UserState.bind(Author)
|
||||||
|
val UserFeaturesSet: BoundFeatureSet = BoundFeatureSet(UserStateBoundFeature)
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Boolean features about viewer's user state.
|
||||||
|
* enum UserState {
|
||||||
|
* NEW = 0,
|
||||||
|
* NEAR_ZERO = 1,
|
||||||
|
* VERY_LIGHT = 2,
|
||||||
|
* LIGHT = 3,
|
||||||
|
* MEDIUM_TWEETER = 4,
|
||||||
|
* MEDIUM_NON_TWEETER = 5,
|
||||||
|
* HEAVY_NON_TWEETER = 6,
|
||||||
|
* HEAVY_TWEETER = 7
|
||||||
|
* }(persisted='true')
|
||||||
|
*/
|
||||||
|
val IS_USER_NEW = new Binary("timelines.author.user_state.is_user_new", Set(UserState).asJava)
|
||||||
|
val IS_USER_LIGHT = new Binary("timelines.author.user_state.is_user_light", Set(UserState).asJava)
|
||||||
|
val IS_USER_MEDIUM_TWEETER =
|
||||||
|
new Binary("timelines.author.user_state.is_user_medium_tweeter", Set(UserState).asJava)
|
||||||
|
val IS_USER_MEDIUM_NON_TWEETER =
|
||||||
|
new Binary("timelines.author.user_state.is_user_medium_non_tweeter", Set(UserState).asJava)
|
||||||
|
val IS_USER_HEAVY_NON_TWEETER =
|
||||||
|
new Binary("timelines.author.user_state.is_user_heavy_non_tweeter", Set(UserState).asJava)
|
||||||
|
val IS_USER_HEAVY_TWEETER =
|
||||||
|
new Binary("timelines.author.user_state.is_user_heavy_tweeter", Set(UserState).asJava)
|
||||||
|
val userStateToFeatureMap: Map[Long, Binary] = Map(
|
||||||
|
0L -> IS_USER_NEW,
|
||||||
|
1L -> IS_USER_LIGHT,
|
||||||
|
2L -> IS_USER_LIGHT,
|
||||||
|
3L -> IS_USER_LIGHT,
|
||||||
|
4L -> IS_USER_MEDIUM_TWEETER,
|
||||||
|
5L -> IS_USER_MEDIUM_NON_TWEETER,
|
||||||
|
6L -> IS_USER_HEAVY_NON_TWEETER,
|
||||||
|
7L -> IS_USER_HEAVY_TWEETER
|
||||||
|
)
|
||||||
|
|
||||||
|
val UserStateBooleanFeatures: Set[Feature[_]] = userStateToFeatureMap.values.toSet
|
||||||
|
|
||||||
|
private val allFeatures: Seq[Feature[_]] = UserStateBooleanFeatures.toSeq
|
||||||
|
override def getFeatureContext: FeatureContext = new FeatureContext(allFeatures: _*)
|
||||||
|
override def commonFeatures: Set[Feature[_]] = Set.empty
|
||||||
|
|
||||||
|
override def adaptToDataRecords(record: PredictionRecord): util.List[DataRecord] = {
|
||||||
|
val newRecord = new RichDataRecord(new DataRecord)
|
||||||
|
record
|
||||||
|
.getFeatureValue(UserStateBoundFeature)
|
||||||
|
.flatMap { userState => userStateToFeatureMap.get(userState.value) }.foreach {
|
||||||
|
booleanFeature => newRecord.setFeatureValue[JBoolean](booleanFeature, true)
|
||||||
|
}
|
||||||
|
|
||||||
|
List(newRecord.getRecord).asJava
|
||||||
|
}
|
||||||
|
}
|
@ -0,0 +1,199 @@
|
|||||||
|
heron_binary(
|
||||||
|
name = "heron-without-jass",
|
||||||
|
main = "com.twitter.timelines.prediction.common.aggregates.real_time.TypeSafeRunner",
|
||||||
|
oss = True,
|
||||||
|
platform = "java8",
|
||||||
|
runtime_platform = "java8",
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
":real_time",
|
||||||
|
"3rdparty/jvm/org/slf4j:slf4j-jdk14",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
jvm_app(
|
||||||
|
name = "rta_heron",
|
||||||
|
binary = ":heron-without-jass",
|
||||||
|
bundles = [
|
||||||
|
bundle(
|
||||||
|
fileset = ["resources/jaas.conf"],
|
||||||
|
),
|
||||||
|
],
|
||||||
|
tags = [
|
||||||
|
"bazel-compatible",
|
||||||
|
"bazel-only",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
scala_library(
|
||||||
|
sources = ["*.scala"],
|
||||||
|
platform = "java8",
|
||||||
|
strict_deps = False,
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
":online-configs",
|
||||||
|
"3rdparty/src/jvm/com/twitter/summingbird:storm",
|
||||||
|
"src/java/com/twitter/heron/util",
|
||||||
|
"src/java/com/twitter/ml/api:api-base",
|
||||||
|
"src/java/com/twitter/ml/api/constant",
|
||||||
|
"src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core:core-features",
|
||||||
|
"src/scala/com/twitter/ml/api/util",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/memcache",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/util",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/runner/store_config",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/runner/storm",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/sources/storm/remote:ClientEventSourceScrooge2",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/adapters/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/adapters/client_log_event_mr",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/common",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/list_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/recap",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/user_health",
|
||||||
|
"src/thrift/com/twitter/ml/api:data-java",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:record-scala",
|
||||||
|
"timelinemixer/common/src/main/scala/com/twitter/timelinemixer/clients/served_features_cache",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/heron",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/job",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/metrics",
|
||||||
|
"timelines/data_processing/ml_util/transforms",
|
||||||
|
"timelines/src/main/scala/com/twitter/timelines/clients/memcache_common",
|
||||||
|
"util/util-core:scala",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
scala_library(
|
||||||
|
name = "online-configs",
|
||||||
|
sources = [
|
||||||
|
"AuthorFeaturesAdapter.scala",
|
||||||
|
"Event.scala",
|
||||||
|
"FeatureStoreUtils.scala",
|
||||||
|
"StormAggregateSourceUtils.scala",
|
||||||
|
"TimelinesOnlineAggregationConfig.scala",
|
||||||
|
"TimelinesOnlineAggregationConfigBase.scala",
|
||||||
|
"TimelinesOnlineAggregationSources.scala",
|
||||||
|
"TimelinesStormAggregateSource.scala",
|
||||||
|
"TweetFeaturesReadableStore.scala",
|
||||||
|
"UserFeaturesAdapter.scala",
|
||||||
|
"UserFeaturesReadableStore.scala",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
strict_deps = True,
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
":base-config",
|
||||||
|
"3rdparty/src/jvm/com/twitter/scalding:db",
|
||||||
|
"3rdparty/src/jvm/com/twitter/storehaus:core",
|
||||||
|
"3rdparty/src/jvm/com/twitter/summingbird:core",
|
||||||
|
"3rdparty/src/jvm/com/twitter/summingbird:online",
|
||||||
|
"3rdparty/src/jvm/com/twitter/summingbird:storm",
|
||||||
|
"abuse/detection/src/main/thrift/com/twitter/abuse/detection/mention_interactions:thrift-scala",
|
||||||
|
"snowflake/src/main/scala/com/twitter/snowflake/id",
|
||||||
|
"snowflake/src/main/thrift:thrift-scala",
|
||||||
|
"src/java/com/twitter/ml/api:api-base",
|
||||||
|
"src/java/com/twitter/ml/api/constant",
|
||||||
|
"src/scala/com/twitter/frigate/data_pipeline/features_aggregated/core:core-features",
|
||||||
|
"src/scala/com/twitter/ml/api/util:datarecord",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/datasets/geo:geo-user-location",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/datasets/magicrecs:user-features",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/entities/core",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/core:user",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/geo",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-activity",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-info",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/trends:tweet_trends_scores",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/data",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/dataset/offline",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/export/strato:app-names",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/feature",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/online",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/params",
|
||||||
|
"src/scala/com/twitter/storehaus_internal/util",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/bijection:bijection-implicits",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/runner/store_config",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/runner/storm",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/sources/common",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/sources/common/remote:ClientEventSourceScrooge",
|
||||||
|
"src/scala/com/twitter/summingbird_internal/sources/storm/remote:ClientEventSourceScrooge2",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/adapters/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/adapters/client_log_event_mr",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/adapters:base",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/aggregates",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/common",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/list_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/recap",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/user_health",
|
||||||
|
"src/thrift/com/twitter/clientapp/gen:clientapp-scala",
|
||||||
|
"src/thrift/com/twitter/dal/personal_data:personal_data-java",
|
||||||
|
"src/thrift/com/twitter/ml/api:data-java",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:engagement-java",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:engagement-scala",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:record-scala",
|
||||||
|
"src/thrift/com/twitter/timelineservice/injection:thrift-scala",
|
||||||
|
"src/thrift/com/twitter/timelineservice/server/suggests/logging:thrift-scala",
|
||||||
|
"strato/src/main/scala/com/twitter/strato/client",
|
||||||
|
"timelinemixer/common/src/main/scala/com/twitter/timelinemixer/clients/served_features_cache",
|
||||||
|
"timelines/data_processing/ad_hoc/suggests/common:raw_training_data_creator",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/heron:configs",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/metrics",
|
||||||
|
"timelines/data_processing/ml_util/transforms",
|
||||||
|
"timelines/data_processing/util:rich-request",
|
||||||
|
"tweetsource/common/src/main/thrift:thrift-scala",
|
||||||
|
"twitter-server-internal/src/main/scala",
|
||||||
|
"unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/config",
|
||||||
|
"unified_user_actions/client/src/main/scala/com/twitter/unified_user_actions/client/summingbird",
|
||||||
|
"unified_user_actions/thrift/src/main/thrift/com/twitter/unified_user_actions:unified_user_actions-scala",
|
||||||
|
"util/util-core:scala",
|
||||||
|
"util/util-stats/src/main/scala/com/twitter/finagle/stats",
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
scala_library(
|
||||||
|
name = "base-config",
|
||||||
|
sources = [
|
||||||
|
"AuthorFeaturesAdapter.scala",
|
||||||
|
"TimelinesOnlineAggregationConfigBase.scala",
|
||||||
|
"TweetFeaturesAdapter.scala",
|
||||||
|
"UserFeaturesAdapter.scala",
|
||||||
|
],
|
||||||
|
platform = "java8",
|
||||||
|
strict_deps = True,
|
||||||
|
tags = ["bazel-compatible"],
|
||||||
|
dependencies = [
|
||||||
|
"src/java/com/twitter/ml/api:api-base",
|
||||||
|
"src/java/com/twitter/ml/api/constant",
|
||||||
|
"src/resources/com/twitter/timelines/prediction/common/aggregates/real_time",
|
||||||
|
"src/scala/com/twitter/ml/api/util:datarecord",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/datasets/magicrecs:user-features",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/entities/core",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/core:user",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/geo",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-activity",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/magicrecs:user-info",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/catalog/features/trends:tweet_trends_scores",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/data",
|
||||||
|
"src/scala/com/twitter/ml/featurestore/lib/feature",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/adapters:base",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/adapters:engagement-converter",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/common/aggregates",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/client_log_event",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/common",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/list_features",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/recap",
|
||||||
|
"src/scala/com/twitter/timelines/prediction/features/user_health",
|
||||||
|
"src/thrift/com/twitter/dal/personal_data:personal_data-java",
|
||||||
|
"src/thrift/com/twitter/ml/api:feature_context-java",
|
||||||
|
"src/thrift/com/twitter/timelines/suggests/common:engagement-scala",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework:common_types",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/heron:base-config",
|
||||||
|
"timelines/data_processing/ml_util/aggregation_framework/metrics",
|
||||||
|
"timelines/data_processing/ml_util/transforms",
|
||||||
|
"util/util-core:scala",
|
||||||
|
"util/util-core:util-core-util",
|
||||||
|
],
|
||||||
|
)
|
@ -0,0 +1,11 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates.real_time
|
||||||
|
|
||||||
|
private[real_time] sealed trait Event[T] { def event: T }
|
||||||
|
|
||||||
|
private[real_time] case class HomeEvent[T](override val event: T) extends Event[T]
|
||||||
|
|
||||||
|
private[real_time] case class ProfileEvent[T](override val event: T) extends Event[T]
|
||||||
|
|
||||||
|
private[real_time] case class SearchEvent[T](override val event: T) extends Event[T]
|
||||||
|
|
||||||
|
private[real_time] case class UuaEvent[T](override val event: T) extends Event[T]
|
@ -0,0 +1,53 @@
|
|||||||
|
package com.twitter.timelines.prediction.common.aggregates.real_time
|
||||||
|
|
||||||
|
import com.twitter.finagle.mtls.authentication.ServiceIdentifier
|
||||||
|
import com.twitter.finagle.stats.StatsReceiver
|
||||||
|
import com.twitter.ml.featurestore.catalog.datasets.magicrecs.UserFeaturesDataset
|
||||||
|
import com.twitter.ml.featurestore.catalog.datasets.geo.GeoUserLocationDataset
|
||||||
|
import com.twitter.ml.featurestore.lib.dataset.DatasetParams
|
||||||
|
import com.twitter.ml.featurestore.lib.export.strato.FeatureStoreAppNames
|
||||||
|
import com.twitter.ml.featurestore.lib.online.FeatureStoreClient
|
||||||
|
import com.twitter.ml.featurestore.lib.params.FeatureStoreParams
|
||||||
|
import com.twitter.strato.client.{Client, Strato}
|
||||||
|
import com.twitter.strato.opcontext.Attribution.ManhattanAppId
|
||||||
|
import com.twitter.util.Duration
|
||||||
|
|
||||||
|
private[real_time] object FeatureStoreUtils {
|
||||||
|
private def mkStratoClient(serviceIdentifier: ServiceIdentifier): Client =
|
||||||
|
Strato.client
|
||||||
|
.withMutualTls(serviceIdentifier)
|
||||||
|
.withRequestTimeout(Duration.fromMilliseconds(50))
|
||||||
|
.build()
|
||||||
|
|
||||||
|
private val featureStoreParams: FeatureStoreParams =
|
||||||
|
FeatureStoreParams(
|
||||||
|
perDataset = Map(
|
||||||
|
UserFeaturesDataset.id ->
|
||||||
|
DatasetParams(
|
||||||
|
stratoSuffix = Some(FeatureStoreAppNames.Timelines),
|
||||||
|
attributions = Seq(ManhattanAppId("athena", "timelines_aggregates_v2_features_by_user"))
|
||||||
|
),
|
||||||
|
GeoUserLocationDataset.id ->
|
||||||
|
DatasetParams(
|
||||||
|
attributions = Seq(ManhattanAppId("starbuck", "timelines_geo_features_by_user"))
|
||||||
|
)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
def mkFeatureStoreClient(
|
||||||
|
serviceIdentifier: ServiceIdentifier,
|
||||||
|
statsReceiver: StatsReceiver
|
||||||
|
): FeatureStoreClient = {
|
||||||
|
com.twitter.server.Init() // necessary in order to use WilyNS path
|
||||||
|
|
||||||
|
val stratoClient: Client = mkStratoClient(serviceIdentifier)
|
||||||
|
val featureStoreClient: FeatureStoreClient = FeatureStoreClient(
|
||||||
|
featureSet =
|
||||||
|
UserFeaturesAdapter.UserFeaturesSet ++ AuthorFeaturesAdapter.UserFeaturesSet ++ TweetFeaturesAdapter.TweetFeaturesSet,
|
||||||
|
client = stratoClient,
|
||||||
|
statsReceiver = statsReceiver,
|
||||||
|
featureStoreParams = featureStoreParams
|
||||||
|
)
|
||||||
|
featureStoreClient
|
||||||
|
}
|
||||||
|
}
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user