mirror of https://github.com/twitter/the-algorithm-ml.git synced 2024-11-05 08:15:08 +01:00

History

rajveer43 0813989fd9 update		2023-09-14 11:30:10 +05:30
..
config	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
data	update	2023-09-14 11:30:10 +05:30
models	update	2023-09-14 11:30:10 +05:30
scripts	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
config.py	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
machines.yaml	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
metrics.py	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
optimizer.py	update	2023-09-13 11:22:13 +05:30
README.md	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00
run.py	update	2023-09-13 11:22:13 +05:30
test_optimizer.py	Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings	2023-03-31 13:05:14 -05:00

README.md

Twhin in torchrec

This project contains code for pretraining dense vector embedding features for Twitter entities. Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models.

We obtain entity embeddings based on a variety of graph data within Twitter such as: "User follows User" "User favorites Tweet" "User clicks Advertisement"

While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used: https://huggingface.co/datasets/Twitter/TwitterFollowGraph https://huggingface.co/datasets/Twitter/TwitterFaveGraph

The code expects parquet files with three columns: lhs, rel, rhs that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively.

The location of the data must be specified in the configuration yaml files in projects/twhin/configs.

Workflow

Build local development images ./scripts/build_images.sh
Run with ./scripts/docker_run.sh
Iterate in image with ./scripts/idocker.sh
Run tests with ./scripts/docker_test.sh