the-algorithm-ml/projects/home/recap/FEATURES.md

59 KiB

Overview

Below is a description of the major feature groups which are input to the Twitter Heavy Ranking model.

Note that not every request will have every feature available due to user settings or other constraints and there may be some differences in ranking "For You" based on different variables.

Aggregate Features

Twitter's aggregate features comprise the bulk of Twitter's feature count and are generated by maintaining rolling aggregations of feature values within a specific scope within a specific time window. We compute aggregates over the long-term (50 days count) and short-term ("real-time" - under 3 days count and typically 30 mins count).

Show Details Aggregate features are groups of multiple features generated as Cartesian crosses from a template and have the format
Feature Group Name Engagement Scope Feature To Aggregate Aggregation Spec
  • The Feature Group Name is both the name of the aggregate feature and contains internally the aggregation scope, that is, what entities are aggregated over.
    • For example, "user_aggregate" aggregates over unique user_ids, and "user_author_aggregate" aggregates over all user-author pairs. It also determines what fields the feature is joined to when being used. In the case of "user_author_aggregate", the feature is joined to data corresponding to the specific user and the specific author.
    • The raw feature group names are often verbose and are simplified in the below presentation.
  • Engagement Scope is the subset of tweets within the aggregation scope that will be aggregated over. Typically this is the name of an output engagement, like recap.engagement.is_favorited. In that case, we only aggregate over Tweets which are also Liked.
  • The Feature To Aggregate is the feature we are accumulating over. If this value is any_feature, that means we aggregate the Tweet count. For example user_aggregate_v2.pair.recap.engagement.is_favorited.any_feature.50.days.count will be the number of Liked records for every user over the last 50 days.
  • The Aggregation Spec is what aggregate to compute - what function and over what time window.

For every Feature Group, we generate one feature for every possible combination of Engagement Scope, Feature To Aggregate, and Aggregation Spec. In particular, every row in the below tables generate one feature for every possible cross between columns.

Example: For example, one such feature may be user_aggregate_v2.pair.recap.engagement.is_favorited.engagement_features.in_network.replies.count.50.days.count, which can be parsed into

Feature Group Name Engagement Scope Feature To Aggregate Aggregation Spec
user_aggregate_v2.pair recap.engagement.is_favorited engagement_features.in_network.replies.count 50.days.count

This means that this feature aggregates

  1. (Over every user),
  2. (Over only tweets favorited by the user),
  3. In network replies sent out by this user,
  4. (Counted over the last 50 days)
This feature is then made available as a feature for the particular user.

The list of our aggregate features are below:

author_aggregate These features aggregate over the author (or original author) of a tweet. Some of the features are short-duration (30 minutes) and some longer (50 days). The features track how many of an author's tweets were engaged with.
author (real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature
30.minutes.count
original_author (real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature
30.minutes.count
original_author (real_time) timelines.engagement.is_share_menu_clicked
timelines.engagement.is_shared
any_feature
30.minutes.count
1.days.count
original_author recap.engagement.is_replied_reply_favorited_by_author
recap.engagement.is_replied_reply_impressed_by_author
recap.engagement.is_replied_reply_replied_by_author
any_feature
50.days.count
author-topic_aggregate These features aggregate over a specific tweet author and a specific topic. We only accumulate long (50 day) counts.
author-topic any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count
list_aggregate These features aggregate short term and long term engagement between a user and a list.
user_list any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count
list (real_time) timelines.engagement.is_block_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_mute_clicked
timelines.engagement.is_replied
timelines.engagement.is_report_tweet_clicked
timelines.engagement.is_retweeted
any_feature
30.minutes.count
user_aggregate These features aggregate short term and long term engagement from a specific user.
user_v2 any_label
recap.engagement.is_favorited
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
any_feature
engagement_features.in_network.favorites.count
engagement_features.in_network.replies.count
engagement_features.in_network.retweets.count
realgraph.num_favorites.days_since_last
realgraph.num_favorites.elapsed_days
realgraph.num_favorites.ewma
realgraph.num_favorites.non_zero_days
realgraph.num_inspected_tweets.days_since_last
realgraph.num_inspected_tweets.elapsed_days
realgraph.num_inspected_tweets.ewma
realgraph.num_inspected_tweets.non_zero_days
realgraph.num_mentions.days_since_last
realgraph.num_mentions.elapsed_days
realgraph.num_mentions.ewma
realgraph.num_mentions.non_zero_days
realgraph.num_profile_views.days_since_last
realgraph.num_profile_views.elapsed_days
realgraph.num_profile_views.ewma
realgraph.num_profile_views.non_zero_days
realgraph.num_retweets.days_since_last
realgraph.num_retweets.elapsed_days
realgraph.num_retweets.ewma
realgraph.num_retweets.non_zero_days
realgraph.num_tweet_clicks.days_since_last
realgraph.num_tweet_clicks.elapsed_days
realgraph.num_tweet_clicks.ewma
realgraph.num_tweet_clicks.non_zero_days
realgraph.total_dwell_time.days_since_last
realgraph.total_dwell_time.elapsed_days
realgraph.total_dwell_time.ewma
realgraph.total_dwell_time.non_zero_days
recap.earlybird.fav_count_v2
recap.earlybird.reply_count_v2
recap.earlybird.retweet_count_v2
recap.searchfeature.blender_score
recap.searchfeature.fav_count
recap.searchfeature.reply_count
recap.searchfeature.retweet_count
recap.searchfeature.text_score
recap.tweetfeature.bidirectional_fav_count
recap.tweetfeature.bidirectional_reply_count
recap.tweetfeature.bidirectional_retweet_count
recap.tweetfeature.contains_media
recap.tweetfeature.conversational_count
recap.tweetfeature.embeds_impression_count
recap.tweetfeature.embeds_url_count
recap.tweetfeature.from_mutual_follow
recap.tweetfeature.has_card
recap.tweetfeature.has_image
recap.tweetfeature.has_link
recap.tweetfeature.has_multiple_media
recap.tweetfeature.has_news
recap.tweetfeature.has_periscope
recap.tweetfeature.has_pro_video
recap.tweetfeature.has_trend
recap.tweetfeature.has_video
recap.tweetfeature.has_vine
recap.tweetfeature.has_visible_link
recap.tweetfeature.is_business_score
recap.tweetfeature.is_extended_reply
recap.tweetfeature.is_reply
recap.tweetfeature.is_retweet
recap.tweetfeature.is_sensitive
recap.tweetfeature.link_count
recap.tweetfeature.link_language
recap.tweetfeature.match_searcher_langs
recap.tweetfeature.match_searcher_main_lang
recap.tweetfeature.match_ui_lang
recap.tweetfeature.mention_searcher
recap.tweetfeature.num_hashtags
recap.tweetfeature.num_mentions
recap.tweetfeature.reply_other
recap.tweetfeature.reply_searcher
recap.tweetfeature.retweet_other
recap.tweetfeature.retweet_searcher
recap.tweetfeature.tweet_count_from_user_in_snapshot
recap.tweetfeature.unidirectiona_fav_count
recap.tweetfeature.unidirectional_reply_count
recap.tweetfeature.unidirectional_retweet_count
recap.tweetfeature.user_rep
recap.tweetfeature.video_view_count
50.days.count
50.days.sum
user_v5 any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
time_features.earlybird.last_favorite_since_creation_hrs
time_features.earlybird.last_quote_since_creation_hrs
time_features.earlybird.last_reply_since_creation_hrs
time_features.earlybird.last_retweet_since_creation_hrs
time_features.earlybird.time_since_last_favorite
time_features.earlybird.time_since_last_quote
time_features.earlybird.time_since_last_reply
time_features.earlybird.time_since_last_retweet
timelines.earlybird.decayed_favorite_count
timelines.earlybird.decayed_quote_count
timelines.earlybird.decayed_reply_count
timelines.earlybird.decayed_retweet_count
timelines.earlybird.embeds_impression_count_v2
timelines.earlybird.embeds_url_count_v2
timelines.earlybird.fake_favorite_count
timelines.earlybird.fake_quote_count
timelines.earlybird.fake_reply_count
timelines.earlybird.fake_retweet_count
timelines.earlybird.quote_count
timelines.earlybird.visible_token_ratio
timelines.earlybird.weighted_fav_count
timelines.earlybird.weighted_quote_count
timelines.earlybird.weighted_reply_count
timelines.earlybird.weighted_retweet_count
50.days.count
50.days.sum
50.days.sumsq
user_v6 recap.engagement.is_replied_reply_favorited_by_author
recap.engagement.is_replied_reply_impressed_by_author
recap.engagement.is_replied_reply_replied_by_author
any_feature
50.days.count
user (twitter_wide) any_label
recap.engagement.is_favorited
recap.engagement.is_replied
recap.engagement.is_retweeted
any_feature
recap.tweetfeature.contains_media
recap.tweetfeature.has_card
recap.tweetfeature.has_hashtag
recap.tweetfeature.has_link
recap.tweetfeature.has_mention
recap.tweetfeature.is_reply
timelines.earlybird.has_quote
50.days.count
user (real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature
client_log_event.tweet.has_consumer_video
client_log_event.tweet.photo_count
30.minutes.count
user (48h_real_time_v5) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature
client_log_event.tweet.has_consumer_video
client_log_event.tweet.photo_count
2.days.count
user (72h_real_time_v6) timelines.engagement.is_block_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_mute_clicked
timelines.engagement.is_report_tweet_clicked
timelines.author.user_state.is_user_heavy_non_tweeter
timelines.author.user_state.is_user_heavy_tweeter
timelines.author.user_state.is_user_light
timelines.author.user_state.is_user_medium_non_tweeter
timelines.author.user_state.is_user_medium_tweeter
timelines.author.user_state.is_user_new
3.days.count
user (profile_real_time_v6) profile.engagement.is_clicked
profile.engagement.is_dwelled
profile.engagement.is_favorited
profile.engagement.is_replied
profile.engagement.is_retweeted
any_feature
client_log_event.tweet.has_consumer_video
client_log_event.tweet.photo_count
30.minutes.count
user (real_time) timelines.engagement.is_share_menu_clicked
timelines.engagement.is_shared
any_feature
client_log_event.tweet.has_consumer_video
client_log_event.tweet.photo_count
1.days.count
30.minutes.count
user (real_time) timelines.engagement.is_fullscreen_video_dwelled
timelines.engagement.is_fullscreen_video_dwelled_10_sec
timelines.engagement.is_fullscreen_video_dwelled_20_sec
timelines.engagement.is_fullscreen_video_dwelled_30_sec
timelines.engagement.is_fullscreen_video_dwelled_5_sec
timelines.engagement.is_profile_dwelled
timelines.engagement.is_profile_dwelled_10_sec
timelines.engagement.is_profile_dwelled_20_sec
timelines.engagement.is_profile_dwelled_30_sec
timelines.engagement.is_tweet_detail_dwelled
timelines.engagement.is_tweet_detail_dwelled_15_sec
timelines.engagement.is_tweet_detail_dwelled_25_sec
timelines.engagement.is_tweet_detail_dwelled_30_sec
timelines.engagement.is_tweet_detail_dwelled_8_sec
any_feature
1.days.count
30.minutes.count
user_author_aggregate These features aggregate over user-author pairs.
user_author_v2 any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
engagement_features.in_network.favorites.count
engagement_features.in_network.replies.count
engagement_features.in_network.retweets.count
recap.earlybird.fav_count_v2
recap.earlybird.reply_count_v2
recap.earlybird.retweet_count_v2
recap.searchfeature.blender_score
recap.searchfeature.fav_count
recap.searchfeature.reply_count
recap.searchfeature.retweet_count
recap.searchfeature.text_score
recap.tweetfeature.embeds_impression_count
recap.tweetfeature.embeds_url_count
recap.tweetfeature.has_card
recap.tweetfeature.has_image
recap.tweetfeature.has_link
recap.tweetfeature.has_multiple_media
recap.tweetfeature.has_news
recap.tweetfeature.has_periscope
recap.tweetfeature.has_pro_video
recap.tweetfeature.has_trend
recap.tweetfeature.has_video
recap.tweetfeature.has_vine
recap.tweetfeature.has_visible_link
recap.tweetfeature.is_reply
recap.tweetfeature.is_retweet
recap.tweetfeature.num_mentions
50.days.count
50.days.sum
user_author_v5 any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
timelines.earlybird.has_quote
timelines.earlybird.label_abusive_flag
timelines.earlybird.label_abusive_hi_rcl_flag
timelines.earlybird.label_dup_content_flag
timelines.earlybird.label_nsfw_hi_prc_flag
timelines.earlybird.label_nsfw_hi_rcl_flag
timelines.earlybird.label_spam_flag
timelines.earlybird.label_spam_hi_rcl_flag
50.days.count
user_author (tweetsource_v1 -
These features are sourced from a different underlying dataset)
any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
tweetsource.tweet.media.num_tags
tweetsource.tweet.media.video_duration
tweetsource.tweet.text.has_question
tweetsource.tweet.text.length
50.days.count
50.days.sum
user_author (twitter_wide -
These features are sourced from a different underlying dataset)
recap.engagement.is_favorited
recap.engagement.is_replied
recap.engagement.is_retweeted
any_feature
recap.tweetfeature.contains_media
recap.tweetfeature.has_card
recap.tweetfeature.has_hashtag
recap.tweetfeature.has_link
recap.tweetfeature.has_mention
recap.tweetfeature.is_reply
timelines.earlybird.has_quote
50.days.count
user_original_author (real_time) timelines.engagement.is_shared
any_feature
1.days.count
30.minutes.count
user_original_author recap.engagement.is_replied_reply_favorited_by_author
recap.engagement.is_replied_reply_impressed_by_author
recap.engagement.is_replied_reply_replied_by_author
any_feature
50.days.count
user_author (real_time, shared) timelines.engagement.is_clicked
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_negative_feedback_union
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_share_menu_clicked
timelines.engagement.is_video_playback_50
any_feature 1.days.count
30.minutes.count
user_engager_aggregate These features aggregate counts of user interaction with other engagers of tweets that the user interacts with.

For example, the user_engager.recap.engagement.is_favorited.any_feature.50.days.count.sparse_top1 feature can be parsed as follows:

For all tweets that a user Likes, accumulate a running count over 50 days where the number of engagement events for every other user who has engaged with the Tweet is accumulated. Engagement is defined as Like or reply. We now have a list of engagement counts for other users that have engaged with the Tweets that the user has Liked, and we take the top count as the feature value.


user_engager
any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count.sparse_mean
50.days.count.sparse_nonzero
50.days.count.sparse_sum
50.days.count.sparse_top1
50.days.count.sparse_top2
user_inferred_topic_aggregate These features aggregate short term and long term engagement between a user and tweets from our internally predicted inferred topic (whether or not the tweet is actually tagged to that topic).
user_inferred_topic_v1 any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count.sparse_mean
50.days.count.sparse_nonzero
50.days.count.sparse_sum
50.days.count.sparse_top1
50.days.count.sparse_top2
user_inferred_topic_v2 recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
engagement_features.in_network.favorites.count
engagement_features.in_network.retweets.count
recap.searchfeature.fav_count
recap.tweetfeature.contains_media
recap.tweetfeature.has_card
recap.tweetfeature.has_image
recap.tweetfeature.has_link
recap.tweetfeature.has_news
recap.tweetfeature.has_trend
recap.tweetfeature.has_video
recap.tweetfeature.is_reply
recap.tweetfeature.is_retweet
recap.tweetfeature.is_sensitive
recap.tweetfeature.match_searcher_langs
recap.tweetfeature.match_searcher_main_lang
recap.tweetfeature.match_ui_lang
recap.tweetfeature.mention_searcher
recap.tweetfeature.reply_other
recap.tweetfeature.reply_searcher
recap.tweetfeature.retweet_other
recap.tweetfeature.retweet_searcher
tweetsource.tweet.media.aspect_ratio_den
tweetsource.tweet.text.num_caps
tweetsource.tweet.text.num_newlines
tweetsource.v2.tweet.media.has_description
tweetsource.v2.tweet.media.has_selected_preview_image
tweetsource.v2.tweet.media.has_title
tweetsource.v2.tweet.media.has_visit_site_call_to_action
tweetsource.v2.tweet.media.has_watch_now_call_to_action
tweetsource.v2.tweet.media.is_360
tweetsource.v2.tweet.media.is_managed
tweetsource.v2.tweet.media.is_monetizable
50.days.count.sparse_mean
50.days.count.sparse_nonzero
50.days.count.sparse_sum
50.days.count.sparse_top1
50.days.count.sparse_top2
user_media_annotation_aggregate These features aggregate how often a user interacts with different types of media (photo, video, etc)
user_media_annotation (keyed by user and media type) any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature 50.days.count.sparse_mean
50.days.count.sparse_nonzero
50.days.count.sparse_sum
50.days.count.sparse_top1
50.days.count.sparse_top2
user_mention_aggregate These features aggregate counts of user interactions with Tweets that mention other users.

Let the original user who viewed a Tweet be user1, and let user2, user3, ..., user_n be users mentioned in a tweet. This feature group aggregates the interactions between user1 and other Tweets that mention user2, user3,..., user_n.

Here sparse_sum means we sum the aggregate values over all mentioned users, sparse_top1 means we take the max of the aggregate values for the mentioned authors, sparse_top1 means we take the second-highest of the aggregate values for the mentioned authors, and so on.


user_mention
any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50 any_feature.50.days.count
any_feature
50.days.count.sparse_mean
50.days.count.sparse_nonzero
50.days.count.sparse_sum
50.days.count.sparse_top1
50.days.count.sparse_top2
user_request_context_aggregate These features aggregate engagements over the request context, which is either the same day of week (dow) or hour of day (hour), to account for temporal effects.
dow
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count
hour
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count
user_topic_aggregate These features aggregate long term feature values between a user and tweets from a particular topic.
user_topic_v1 any_label
recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
any_feature
50.days.count
user_topic_v2 recap.engagement.is_clicked
recap.engagement.is_favorited
recap.engagement.is_open_linked
recap.engagement.is_photo_expanded
recap.engagement.is_profile_clicked
recap.engagement.is_replied
recap.engagement.is_retweeted
recap.engagement.is_video_playback_50
engagement_features.in_network.favorites.count
engagement_features.in_network.retweets.count
recap.searchfeature.fav_count
recap.tweetfeature.contains_media
recap.tweetfeature.has_card
recap.tweetfeature.has_image
recap.tweetfeature.has_link
recap.tweetfeature.has_news
recap.tweetfeature.has_trend
recap.tweetfeature.has_video
recap.tweetfeature.is_reply
recap.tweetfeature.is_retweet
recap.tweetfeature.is_sensitive
recap.tweetfeature.match_searcher_langs
recap.tweetfeature.match_searcher_main_lang
recap.tweetfeature.match_ui_lang
recap.tweetfeature.mention_searcher
recap.tweetfeature.reply_other
recap.tweetfeature.reply_searcher
recap.tweetfeature.retweet_other
recap.tweetfeature.retweet_searcher
tweetsource.tweet.media.aspect_ratio_den
tweetsource.tweet.text.num_caps
tweetsource.tweet.text.num_newlines
tweetsource.v2.tweet.media.has_description
tweetsource.v2.tweet.media.has_selected_preview_image
tweetsource.v2.tweet.media.has_title
tweetsource.v2.tweet.media.has_visit_site_call_to_action
tweetsource.v2.tweet.media.has_watch_now_call_to_action
tweetsource.v2.tweet.media.is_360
tweetsource.v2.tweet.media.is_managed
tweetsource.v2.tweet.media.is_monetizable
50.days.count
topic_aggregate These features aggregate values for tweets that come from a particular topic.
topic (real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_not_interested_in_topic
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature
30.minutes.count
topic (24_hour_real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_block_clicked
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_mute_clicked
timelines.engagement.is_not_about_topic
timelines.engagement.is_not_interested_in_topic
timelines.engagement.is_not_recent
timelines.engagement.is_not_relevant
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_report_tweet_clicked
timelines.engagement.is_retweeted
timelines.engagement.is_see_fewer
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_unfollow_topic
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature 1.days.count
topic-country_code (real_time) timelines.engagement.is_block_clicked
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_impressed
timelines.engagement.is_mute_clicked
timelines.engagement.is_not_about_topic
timelines.engagement.is_not_interested_in_topic
timelines.engagement.is_not_recent
timelines.engagement.is_not_relevant
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_replied
timelines.engagement.is_report_tweet_clicked
timelines.engagement.is_retweeted
timelines.engagement.is_see_fewer
timelines.engagement.is_share_menu_clicked
timelines.engagement.is_shared
timelines.engagement.is_unfollow_topic
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
any_feature 3.days.count
30.minutes.count
topic-share (real_time) timelines.engagement.is_share_menu_clicked
timelines.engagement.is_shared
any_feature 1.days.count
30.minutes.count
tweet_aggregate These features aggregate values corresponding to a tweet.
tweet (real_time) timelines.enagagement.is_retweeted_without_quote
timelines.engagement.is_clicked
timelines.engagement.is_dont_like
timelines.engagement.is_dwelled
timelines.engagement.is_favorited
timelines.engagement.is_followed
timelines.engagement.is_open_linked
timelines.engagement.is_photo_expanded
timelines.engagement.is_profile_clicked
timelines.engagement.is_quoted
timelines.engagement.is_replied
timelines.engagement.is_retweeted
timelines.engagement.is_tweet_share_dm_clicked
timelines.engagement.is_tweet_share_dm_sent
timelines.engagement.is_video_playback_50
timelines.engagement.is_video_quality_viewed
timelines.engagement.is_video_viewed
any_feature 30.minutes.count
Duration.Top.count
tweet_v2 (real_time) timelines.engagement.is_block_clicked
timelines.engagement.is_mute_clicked
timelines.engagement.is_report_tweet_clicked
any_feature
30.minutes.count
Duration.Top.count
tweet (real_time dwell) timelines.engagement.is_fullscreen_video_dwelled
timelines.engagement.is_fullscreen_video_dwelled_10_sec
timelines.engagement.is_fullscreen_video_dwelled_20_sec
timelines.engagement.is_fullscreen_video_dwelled_30_sec
timelines.engagement.is_fullscreen_video_dwelled_5_sec
timelines.engagement.is_profile_dwelled
timelines.engagement.is_profile_dwelled_10_sec
timelines.engagement.is_profile_dwelled_20_sec
timelines.engagement.is_profile_dwelled_30_sec
timelines.engagement.is_tweet_detail_dwelled
timelines.engagement.is_tweet_detail_dwelled_15_sec
timelines.engagement.is_tweet_detail_dwelled_25_sec
timelines.engagement.is_tweet_detail_dwelled_30_sec
timelines.engagement.is_tweet_detail_dwelled_8_sec
any_feature 1.days.count
30.minutes.count
tweet (real_time shared) timelines.engagement.is_share_menu_clicked
timelines.engagement.is_shared
any_feature 1.days.count
30.minutes.count

Non Aggregate Features

We have a number of standalone features capturing information about the user, the tweet, the author, and the tweet context.

two_hop
This feature group contains features about interactions which are "two-hop" between a user and the tweet author. Examples of two-top interactions are: If user 1 favorites a tweet by user 2, and user 2 favorites a tweet by user 3, there will be a positive value for the "favorite.favorited_by" two-hop feature between user 1 and user 3.

The feature group consists of all possible crosses of the below features.

two_hop favorite
following
mutual_follow
favorited_by
followed_by
mentioned_by
retweeted_by
normalized
two_hop favorited_by
favorited_by
mentioned_by
retweeted_by
right_degree
realgraph
This feature group contains features about interactions between the user and the Tweet author.

The feature group consists of all possible crosses of the below features.

realgraph dst_id
src_id
realgraph num_address_book_email
num_address_book_in_both
num_address_book_mutual_edge_email
num_address_book_mutual_edge_in_both
num_address_book_mutual_edge_phone
num_address_book_phone
num_blocks
num_direct_messages
num_favorites
num_follow
num_inspected_tweets
num_link_clicks
num_mentions
num_mutes
num_mutual_follow
num_photo_tags
num_profile_views
num_report_as_abuses
num_report_as_spams
num_retweets
num_sms_follow
num_tweet_clicks
total_dwell_time
weight
days_since_last
days_since_last.sparse_avg
days_since_last.sparse_max
days_since_last.sparse_sum
elapsed_days
elapsed_days.sparse_avg
elapsed_days.sparse_max
elapsed_days.sparse_sum
ewma
ewma.sparse_avg
ewma.sparse_max
ewma.sparse_sum
is_missing
m2ForVariance.sparse_avg
m2ForVariance.sparse_max
m2ForVariance.sparse_sum
mean
mean.sparse_avg
mean.sparse_max
mean.sparse_sum
non_zero_days
non_zero_days.sparse_avg
non_zero_days.sparse_max
non_zero_days.sparse_sum
sparse_avg
sparse_max
sparse_sum
variance
authors.realgraph This feature group contains features about interactions between the user and various other users including
  1. the Tweet author
  2. any users mentioned in the Tweet
  3. in-network engagers with the Tweet
  4. upstream authors if the Tweet was part of a reply chain
Note that all the above users are included in the interaction set, not just the Tweet author.

The feature group consists of all possible crosses of the below features.


authors.realgraph weight sparse_avg
sparse_max
sparse_sum
authors.realgraph num_address_book_email
num_address_book_in_both
num_address_book_mutual_edge_email
num_address_book_mutual_edge_in_both
num_address_book_phone
num_blocks
num_direct_messages
num_favorites
num_follow
num_inspected_tweets
num_link_clicks
num_mentions
num_mutes
num_mutual_follow
num_photo_tags
num_profile_views
num_report_as_abuses
num_report_as_spams
num_retweets
num_sms_follow
num_tweet_clicks
total_dwell_time
days_since_last
elapsed_days
ewma
m2ForVariance
mean
non_zero_days
sparse_avg
sparse_max
sparse_sum
recap.tweetfeature, recap.searchfeature, etc
This feature group contains features about the tweet, whether from the tweets service or the search service ("Earlybird"). It also contains features related to the user's device type.
recap.earlybird.fav_count_v2
recap.earlybird.reply_count_v2
recap.earlybird.retweet_count_v2
recap.searchfeature.blender_score
recap.searchfeature.fav_count
recap.searchfeature.reply_count
recap.searchfeature.retweet_count
recap.searchfeature.text_score
recap.source.type
recap.tweetfeature.bidirectional_fav_count
recap.tweetfeature.bidirectional_reply_count
recap.tweetfeature.bidirectional_retweet_count
recap.tweetfeature.contains_media
recap.tweetfeature.conversational_count
recap.tweetfeature.embeds_impression_count
recap.tweetfeature.embeds_url_count
recap.tweetfeature.from_inactive_user
recap.tweetfeature.from_mutual_follow
recap.tweetfeature.from_verified_account
recap.tweetfeature.has_card
recap.tweetfeature.has_consumer_video
recap.tweetfeature.has_hashtag
recap.tweetfeature.has_image
recap.tweetfeature.has_link
recap.tweetfeature.has_mention
recap.tweetfeature.has_multiple_hashtag_or_trend
recap.tweetfeature.has_multiple_media
recap.tweetfeature.has_native_image
recap.tweetfeature.has_native_video
recap.tweetfeature.has_news
recap.tweetfeature.has_periscope
recap.tweetfeature.has_pro_video
recap.tweetfeature.has_trend
recap.tweetfeature.has_video
recap.tweetfeature.has_vine
recap.tweetfeature.has_visible_link
recap.tweetfeature.is_author_bot
recap.tweetfeature.is_author_new
recap.tweetfeature.is_author_profile_egg
recap.tweetfeature.is_author_spam
recap.tweetfeature.is_business_score
recap.tweetfeature.is_extended_reply
recap.tweetfeature.is_offensive
recap.tweetfeature.is_reply
recap.tweetfeature.is_retweet
recap.tweetfeature.is_sensitive
recap.tweetfeature.language
recap.tweetfeature.link_count
recap.tweetfeature.link_language
recap.tweetfeature.match_searcher_langs
recap.tweetfeature.match_searcher_main_lang
recap.tweetfeature.match_ui_lang
recap.tweetfeature.mention_searcher
recap.tweetfeature.num_hashtags
recap.tweetfeature.num_mentions
recap.tweetfeature.prev_user_tweet_enagagement
recap.tweetfeature.reply_other
recap.tweetfeature.reply_searcher
recap.tweetfeature.retweet_other
recap.tweetfeature.retweet_searcher
recap.tweetfeature.signature
recap.tweetfeature.tweet_count_from_user_in_snapshot
recap.tweetfeature.unidirectiona_fav_count
recap.tweetfeature.unidirectional_reply_count
recap.tweetfeature.unidirectional_retweet_count
recap.tweetfeature.user_rep
recap.tweetfeature.video_view_count
recap.user_agent.client_name
recap.user_agent.client_source
recap.user_agent.client_version
recap.user_agent.client_version_code
recap.user_agent.device
recap.user_agent.manufacturer
recap.user_agent.network_connection
recap.user_agent.sdk_version
recap.v2.tweetfeature.is_retweet_directed_at_user_in_first_degree
recap.v2.tweetfeature.is_retweet_of_reply
recap.v2.tweetfeature.is_retweeter_bot
recap.v2.tweetfeature.is_retweeter_new
recap.v2.tweetfeature.is_retweeter_nsfw
recap.v2.tweetfeature.is_retweeter_profile_egg
recap.v2.tweetfeature.is_retweeter_spam
recap.v2.tweetfeature.retweet_of_mutual_follow
recap.v2.tweetfeature.source_author_rep
recap.v3.tweetfeature.probably_from_follow
tweetsource
This feature group contains features about the tweet media as well as conversation-related features about the tweet.

tweetsource.tweet.media.aspect_ratio_den
tweetsource.tweet.media.aspect_ratio_num
tweetsource.tweet.media.bit_rate
tweetsource.tweet.media.height_1
tweetsource.tweet.media.height_2
tweetsource.tweet.media.height_3
tweetsource.tweet.media.height_4
tweetsource.tweet.media.num_tags
tweetsource.tweet.media.resize_method_1
tweetsource.tweet.media.resize_method_2
tweetsource.tweet.media.resize_method_3
tweetsource.tweet.media.resize_method_4
tweetsource.tweet.media.video_duration
tweetsource.tweet.media.width_1
tweetsource.tweet.media.width_2
tweetsource.tweet.media.width_3
tweetsource.tweet.media.width_4
tweetsource.tweet.text.has_question
tweetsource.tweet.text.length
tweetsource.tweet.text.length_type
tweetsource.tweet.text.num_caps
tweetsource.tweet.text.num_newlines
tweetsource.tweet.text.num_whitespaces
tweetsource.v2.tweet.media.color_1_blue
tweetsource.v2.tweet.media.color_1_green
tweetsource.v2.tweet.media.color_1_percentage
tweetsource.v2.tweet.media.color_1_red
tweetsource.v2.tweet.media.face_areas
tweetsource.v2.tweet.media.has_app_install_call_to_action
tweetsource.v2.tweet.media.has_description
tweetsource.v2.tweet.media.has_selected_preview_image
tweetsource.v2.tweet.media.has_title
tweetsource.v2.tweet.media.has_visit_site_call_to_action
tweetsource.v2.tweet.media.has_watch_now_call_to_action
tweetsource.v2.tweet.media.is_360
tweetsource.v2.tweet.media.is_embeddable
tweetsource.v2.tweet.media.is_managed
tweetsource.v2.tweet.media.is_monetizable
tweetsource.v2.tweet.media.num_color_pallette_items
tweetsource.v2.tweet.media.num_faces
tweetsource.v2.tweet.media.num_stickers
tweetsource.v2.tweet.media.view_count
in_reply_to_tweet
If the tweet was a reply, this feature group contains the features of the replied to tweet.
in_reply_to_tweet.recap.earlybird.fav_count_v2
in_reply_to_tweet.recap.earlybird.reply_count_v2
in_reply_to_tweet.recap.earlybird.retweet_count_v2
in_reply_to_tweet.recap.searchfeature.fav_count
in_reply_to_tweet.recap.searchfeature.reply_count
in_reply_to_tweet.recap.searchfeature.retweet_count
in_reply_to_tweet.recap.searchfeature.text_score
in_reply_to_tweet.recap.tweetfeature.bidirectional_fav_count
in_reply_to_tweet.recap.tweetfeature.bidirectional_reply_count
in_reply_to_tweet.recap.tweetfeature.bidirectional_retweet_count
in_reply_to_tweet.recap.tweetfeature.conversational_count
in_reply_to_tweet.recap.tweetfeature.from_mutual_follow
in_reply_to_tweet.recap.tweetfeature.from_verified_account
in_reply_to_tweet.recap.tweetfeature.has_hashtag
in_reply_to_tweet.recap.tweetfeature.has_image
in_reply_to_tweet.recap.tweetfeature.has_mention
in_reply_to_tweet.recap.tweetfeature.has_news
in_reply_to_tweet.recap.tweetfeature.has_video
in_reply_to_tweet.recap.tweetfeature.has_visible_link
in_reply_to_tweet.recap.tweetfeature.is_author_bot
in_reply_to_tweet.recap.tweetfeature.is_author_new
in_reply_to_tweet.recap.tweetfeature.is_author_nsfw
in_reply_to_tweet.recap.tweetfeature.is_author_spam
in_reply_to_tweet.recap.tweetfeature.is_offensive
in_reply_to_tweet.recap.tweetfeature.is_reply
in_reply_to_tweet.recap.tweetfeature.is_sensitive
in_reply_to_tweet.recap.tweetfeature.num_mentions
in_reply_to_tweet.recap.tweetfeature.prev_user_tweet_enagagement
in_reply_to_tweet.recap.tweetfeature.unidirectiona_fav_count
in_reply_to_tweet.recap.tweetfeature.unidirectional_reply_count
in_reply_to_tweet.recap.tweetfeature.unidirectional_retweet_count
in_reply_to_tweet.recap.tweetfeature.user_rep
in_reply_to_tweet.timelines.earlybird.decayed_favorite_count
in_reply_to_tweet.timelines.earlybird.decayed_quote_count
in_reply_to_tweet.timelines.earlybird.decayed_reply_count
in_reply_to_tweet.timelines.earlybird.decayed_retweet_count
in_reply_to_tweet.timelines.earlybird.has_quote
in_reply_to_tweet.timelines.earlybird.quote_count
in_reply_to_tweet.timelines.earlybird.weighted_fav_count
in_reply_to_tweet.timelines.earlybird.weighted_quote_count
in_reply_to_tweet.timelines.earlybird.weighted_reply_count
in_reply_to_tweet.timelines.earlybird.weighted_retweet_count
in_reply_to_tweet.timelines.earlybird_score
in_reply_to_tweet.tweetsource.tweet.media.aspect_ratio_den
in_reply_to_tweet.tweetsource.tweet.media.aspect_ratio_num
in_reply_to_tweet.tweetsource.tweet.media.height_1
in_reply_to_tweet.tweetsource.tweet.media.height_2
in_reply_to_tweet.tweetsource.tweet.media.video_duration
in_reply_to_tweet.tweetsource.tweet.text.has_question
in_reply_to_tweet.tweetsource.tweet.text.length
in_reply_to_tweet.tweetsource.tweet.text.num_caps
timelines.earlybird
This feature group passes on features used by the search and light ranking service ("Earlybird") to the Heavy Ranker.
timelines.earlybird.decayed_favorite_count
timelines.earlybird.decayed_quote_count
timelines.earlybird.decayed_reply_count
timelines.earlybird.decayed_retweet_count
timelines.earlybird.embeds_impression_count_v2
timelines.earlybird.embeds_url_count_v2
timelines.earlybird.fake_favorite_count
timelines.earlybird.fake_quote_count
timelines.earlybird.fake_reply_count
timelines.earlybird.fake_retweet_count
timelines.earlybird.has_quote
timelines.earlybird.is_composer_source_camera
timelines.earlybird.label_abusive_flag
timelines.earlybird.label_abusive_hi_rcl_flag
timelines.earlybird.label_dup_content_flag
timelines.earlybird.label_nsfw_hi_prc_flag
timelines.earlybird.label_nsfw_hi_rcl_flag
timelines.earlybird.label_spam_flag
timelines.earlybird.label_spam_hi_rcl_flag
timelines.earlybird.periscope_exists
timelines.earlybird.periscope_has_been_featured
timelines.earlybird.periscope_is_currently_featured
timelines.earlybird.periscope_is_from_quality_source
timelines.earlybird.periscope_is_live
timelines.earlybird.preported_tweet_score
timelines.earlybird.quote_count
timelines.earlybird.visible_token_ratio
timelines.earlybird.weighted_fav_count
timelines.earlybird.weighted_quote_count
timelines.earlybird.weighted_reply_count
timelines.earlybird.weighted_retweet_count
realtime_interaction_graph
User-author interaction features. Similar to RealGraph but updated more rapidly.
realtime_interaction_graph.click.count
realtime_interaction_graph.click.days_since_last
realtime_interaction_graph.fav.count
realtime_interaction_graph.fav.days_since_last
realtime_interaction_graph.mention.count
realtime_interaction_graph.mention.days_since_last
realtime_interaction_graph.profile_view.count
realtime_interaction_graph.profile_view.days_since_last
realtime_interaction_graph.retweet.count
realtime_interaction_graph.retweet.days_since_last
realtime_interaction_graph.soft_follow.count
realtime_interaction_graph.soft_follow.days_since_last
user_tweet.recommendations
Similarity of a tweet to a user's recent engaged tweets.
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_1d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_1d_last_10_max
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_7d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_7d_last_10_max
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_30d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_30d_last_10_max
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_7d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_7d_last_10_max
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_1d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_1d_last_10_max
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_7d_last_10_avg
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_7d_last_10_max
user-tweet.recommendations.sim_clusters_scores.user_interested_in_tweet_embedding_dot_product_20m_145k_2020
other
Here we list individual features not covered in any feature group
author_health.num_connect
author_health.num_connect_days
author_health.num_followers
engagement_features.in_network.favorites.count
engagement_features.in_network.replies.count
engagement_features.in_network.retweets.count
request_context.display_dpi
request_context.display_height
request_context.display_width
request_context.is_get_initial
request_context.is_get_middle
request_context.is_get_newer
request_context.is_get_older
request_context.is_session_start
time_features.earlybird.last_favorite_since_creation_hrs
time_features.earlybird.last_quote_since_creation_hrs
time_features.earlybird.last_reply_since_creation_hrs
time_features.earlybird.last_retweet_since_creation_hrs
time_features.earlybird.time_since_last_favorite
time_features.earlybird.time_since_last_quote
time_features.earlybird.time_since_last_reply
time_features.earlybird.time_since_last_retweet
time_features.is_tweet_recycled
time_features.non_polling_requests_since_tweet_creation
time_features.time_between_non_polling_requests_avg
time_features.time_since_last_non_polling_request
time_features.time_since_source_tweet_creation
time_features.time_since_tweet_creation
time_features.time_since_viewer_account_creation_secs
time_features.tweet_age_ratio

Embeddings Features

Twhin is a large graph embedding trained on Twitter data. We use three 200-dimensional embeddings sourced from the Twhin algorithm.

Twhin Follow Embeddings
We have two embeddings trained on the user-user follow graph, one representing who is likely to follow a user and the other representing who a user is likely to follow. Each embedding is 200-dimensional.
Twhin Engagement Embeddings
We have one embedding trained on the user-tweet engagement graph, representing users based on the Tweets they are likely to engage with. This embedding is 200 dimensional.