From 6e5c875a69b5dc400302e42a3d0b2cfe509c71b6 Mon Sep 17 00:00:00 2001 From: twitter-team <> Date: Fri, 14 Apr 2023 13:55:59 -0700 Subject: [PATCH] [opensource] Update README to include all new modules Since the first batch of open sourcing, we have added the following components: - User signal service - Unified user actions - Topic social proof service Update the README to include these. --- README.md | 48 ++++++++++++++++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index af87e0b51..79a7e6135 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,39 @@ # Twitter's Recommendation Algorithm -Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the -Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The -diagram below illustrates how major services and jobs interconnect. +Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for serving feeds of Tweets and other content across all Twitter product surfaces (e.g. For You Timeline, Search, Explore). For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). -![](docs/system-diagram.png) +## Architecture -These are the main components of the Recommendation Algorithm included in this repository: +Product surfaces at Twitter are built on a shared set of data, models, and software frameworks. The shared components included in this repository are listed below: + +| Type | Component | Description | +|------------|------------|------------| +| Data | [unified-user-actions](unified_user_actions/README.md) | Real-time stream of user actions on Twitter. | +| | [user-signal-service](user-signal-service/README.md) | Centralized platform to retrieve explicit (e.g. likes, replies) and implicit (e.g. profile visits, tweet clicks) user signals. | +| Model | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. | +| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. | +| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. | +| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. | +| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. | +| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. | +| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). | +| | [topic-social-proof](topic-social-proof/README.md) | Identifies topics related to individual Tweets. | +| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. | +| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. | +| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. | + +The product surface currently included in this repository is the For You Timeline. + +### For You Timeline + +The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline. + +![](docs/system-diagram.png) + +The core components of the For You Timeline included in this repository are listed below: | Type | Component | Description | |------------|------------|------------| -| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. | -| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. | -| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. | -| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. | -| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. | -| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. | -| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). | | Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. | | | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. | | | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). | @@ -26,11 +43,10 @@ These are the main components of the Recommendation Algorithm included in this r | Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). | | | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. | | | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. | -| Software framework | [navi](navi/README.md) | High performance, machine learning model serving written in Rust. | -| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. | -| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. | -We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. +## Build and test code + +We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. We plan to add a more complete build and test system in the future. ## Contributing