Source code for Twitter's Recommendation Algorithm
Go to file
2024-01-10 17:06:57 -06:00
graph-feature-service Twitter Recommendation Algorithm 2023-03-31 17:36:31 -05:00
home-mixer [opensource] Update home mixer with latest changes 2023-07-13 16:33:04 +05:30
navi improvements from external prs 2023-04-28 10:37:15 -05:00
product-mixer Twitter Recommendation Algorithm 2023-03-31 17:36:31 -05:00
pushservice README updates 2023-05-22 16:11:33 -05:00
recos-injector [minor] Fix grammar + typo issues 2023-04-04 16:13:24 -05:00
representation-manager Open-sourcing Representation Manager 2023-04-28 14:17:58 -05:00
representation-scorer Open-sourcing Representation Scorer 2023-04-28 14:18:16 -05:00
science/search/ingester/config Twitter Recommendation Algorithm 2023-03-31 17:36:31 -05:00
simclusters-ann [minor] Fix grammar + typo issues 2023-04-04 16:13:24 -05:00
src Open-sourcing Timelines Aggregation Framework 2023-04-28 14:17:02 -05:00
timelineranker [minor] Fix grammar + typo issues 2023-04-04 16:13:24 -05:00
timelines/data_processing Open-sourcing Timelines Aggregation Framework 2023-04-28 14:17:02 -05:00
topic-social-proof Open-sourcing Topic Social Proof Service 2023-04-14 16:45:36 -05:00
trust_and_safety_models [minor] Fix grammar + typo issues 2023-04-04 16:13:24 -05:00
tweetypie Open-sourcing Tweetypie 2023-05-19 16:20:06 -05:00
twml [minor] Fix grammar + typo issues 2023-04-04 16:13:24 -05:00
unified_user_actions [Medium][UUA] Clean up BCE in UUA 2023-04-28 10:29:20 -05:00
user-signal-service Open-sourcing User Signal Service 2023-04-14 16:45:37 -05:00
visibilitylib [VF] updates includes addressing Ukraine labels 2023-04-04 20:35:00 -05:00
.gitignore Twitter Recommendation Algorithm 2023-03-31 17:36:31 -05:00
COPYING Twitter Recommendation Algorithm 2023-03-31 17:36:31 -05:00
README.md README updates 2023-05-22 16:11:33 -05:00
RETREIVAL_SIGNALS.md User Signals in Candidate Sourcing Stage 2023-04-28 14:16:22 -05:00

Twitter's Recommendation Algorithm

Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for serving feeds of Tweets and other content across all Twitter product surfaces (e.g. For You Timeline, Search, Explore, Notifications). For an introduction to how the algorithm works, please refer to our engineering blog.

Architecture

Product surfaces at Twitter are built on a shared set of data, models, and software frameworks. The shared components included in this repository are listed below:

Type Component Description
Data tweetypie Core Tweet service that handles the reading and writing of Tweet data.
unified-user-actions Real-time stream of user actions on Twitter.
user-signal-service Centralized platform to retrieve explicit (e.g. likes, replies) and implicit (e.g. profile visits, tweet clicks) user signals.
Model SimClusters Community detection and sparse embeddings into those communities.
TwHIN Dense knowledge graph embeddings for Users and Tweets.
trust-and-safety-models Models for detecting NSFW or abusive content.
real-graph Model to predict the likelihood of a Twitter User interacting with another User.
tweepcred Page-Rank algorithm for calculating Twitter User reputation.
recos-injector Streaming event processor for building input streams for GraphJet based services.
graph-feature-service Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B).
topic-social-proof Identifies topics related to individual Tweets.
representation-scorer Compute scores between pairs of entities (Users, Tweets, etc.) using embedding similarity.
Software framework navi High performance, machine learning model serving written in Rust.
product-mixer Software framework for building feeds of content.
timelines-aggregation-framework Framework for generating aggregate features in batch or real time.
representation-manager Service to retrieve embeddings (i.e. SimClusers and TwHIN).
twml Legacy machine learning framework built on TensorFlow v1.

The product surfaces currently included in this repository are the For You Timeline and Recommended Notifications.

For You Timeline

The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline.

The core components of the For You Timeline included in this repository are listed below:

Type Component Description
Candidate Source search-index Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source.
cr-mixer Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services.
user-tweet-entity-graph (UTEG) Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the GraphJet framework. Several other GraphJet based features and candidate sources are located here.
follow-recommendation-service (FRS) Provides Users with recommendations for accounts to follow, and Tweets from those accounts.
Ranking light-ranker Light Ranker model used by search index (Earlybird) to rank Tweets.
heavy-ranker Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing.
Tweet mixing & filtering home-mixer Main service used to construct and serve the Home Timeline. Built on product-mixer.
visibility-filters Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking.
timelineranker Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service.

The core components of Recommended Notifications included in this repository are listed below:

Type Component Description
Service pushservice Main recommendation service at Twitter used to surface recommendations to our users via notifications.
Ranking pushservice-light-ranker Light Ranker model used by pushservice to rank Tweets. Bridges candidate generation and heavy ranking by pre-selecting highly-relevant candidates from the initial huge candidate pool.
pushservice-heavy-ranker Multi-task learning model to predict the probabilities that the target users will open and engage with the sent notifications.

Build and test code

We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file. We plan to add a more complete build and test system in the future.

Contributing

We invite the community to submit GitHub issues and pull requests for suggestions on improving the recommendation algorithm. We are working on tools to manage these suggestions and sync changes to our internal repository. Any security concerns or issues should be routed to our official bug bounty program through HackerOne. We hope to benefit from the collective intelligence and expertise of the global community in helping us identify issues and suggest improvements, ultimately leading to a better Twitter.

Read our blog on the open source initiative here.