the-algorithm-ml/projects/twhin
2023-04-03 11:09:15 +02:00
..
config Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
data Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
models Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
scripts Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
config.py Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
machines.yaml Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
metrics.py Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
optimizer.py Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
README.md Update README.md 2023-04-03 11:09:15 +02:00
run.py Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00
test_optimizer.py Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings 2023-03-31 13:05:14 -05:00

Twhin in torchrec

This project contains code for pretraining dense vector embedding features for Twitter entities. Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models.

We obtain entity embeddings based on a variety of graph data within Twitter such as:

  • "User follows User"
  • "User favorites Tweet"
  • "User clicks Advertisement"

While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used:

The code expects parquet files with three columns:

  • lhs
  • rel
  • rhs
    that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively.

The location of the data must be specified in the configuration yaml files in projects/twhin/configs.

Workflow

  • Build local development images ./scripts/build_images.sh
  • Run with ./scripts/docker_run.sh
  • Iterate in image with ./scripts/idocker.sh
  • Run tests with ./scripts/docker_test.sh