Mirrors/the-algorithm-ml

mirror of https://github.com/twitter/the-algorithm-ml.git synced 2025-03-21 05:34:56 +01:00

History

Brian Jordan cb1ff279f2 Fix additional typos in various comments/docs

2023-03-31 15:25:04 -04:00

..

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

Fix additional typos in various comments/docs

2023-03-31 15:25:04 -04:00

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

config.py

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

machines.yaml

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

metrics.py

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

optimizer.py

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

README.md

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

run.py

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

test_optimizer.py

Twitter's Recommendation Algorithm - Heavy Ranker and TwHIN embeddings

2023-03-31 13:05:14 -05:00

README.md

Twhin in torchrec

This project contains code for pretraining dense vector embedding features for Twitter entities. Within Twitter, these embeddings are used for candidate retrieval and as model features in a variety of recommender system models.

We obtain entity embeddings based on a variety of graph data within Twitter such as: "User follows User" "User favorites Tweet" "User clicks Advertisement"

While we cannot release the graph data used to train TwHIN embeddings due to privacy restrictions, heavily subsampled, anonymized open-sourced graph data can used: https://huggingface.co/datasets/Twitter/TwitterFollowGraph https://huggingface.co/datasets/Twitter/TwitterFaveGraph

The code expects parquet files with three columns: lhs, rel, rhs that refer to the vocab index of the left-hand-side node, relation type, and right-hand-side node of each edge in a graph respectively.

The location of the data must be specified in the configuration yaml files in projects/twhin/configs.

Workflow

Build local development images ./scripts/build_images.sh
Run with ./scripts/docker_run.sh
Iterate in image with ./scripts/idocker.sh
Run tests with ./scripts/docker_test.sh