the-algorithm-ml/projects/home/recap/FEATURES.md

2268 lines
59 KiB
Markdown
Raw Normal View History

# Overview
Below is a description of the major feature groups which are input to the Twitter Heavy Ranking model.
Note that not every request will have every feature available due to user settings or other constraints and there may be some differences in ranking "For You" based on different variables.
## Aggregate Features
Twitter's aggregate features comprise the bulk of Twitter's feature count and are generated by maintaining rolling aggregations of feature values within a specific scope within a specific time window. We compute aggregates over the long-term (50 days count) and short-term ("real-time" - under 3 days count and typically 30 mins count).
<details>
<summary><b>Show Details</b></summary>
Aggregate features are groups of multiple features generated as Cartesian crosses from a template and have the format
<table>
<tr>
<td><b>Feature Group Name</b></td>
<td><b>Engagement Scope</b></td>
<td><b>Feature To Aggregate</b></td>
<td><b>Aggregation Spec</b></td>
</tr>
</table>
<ul>
<li> The <b>Feature Group Name</b> is both the name of the aggregate feature and contains internally the aggregation scope, that is, what entities are aggregated over.
<ul>
<li> For example, <code>"user_aggregate"</code> aggregates over unique user_ids, and <code>"user_author_aggregate"</code> aggregates over all user-author pairs. It also determines what fields the feature is joined to when being used. In the case of <code>"user_author_aggregate"</code>, the feature is joined to data corresponding to the specific user and the specific author.
<li> The raw feature group names are often verbose and are simplified in the below presentation.
</ul>
<li> <b>Engagement Scope</b> is the subset of tweets within the aggregation scope that will be aggregated over. Typically this is the name of an output engagement, like <code>recap.engagement.is_favorited</code>. In that case, we only aggregate over Tweets which are also Liked.
<li> The <b>Feature To Aggregate</b> is the feature we are accumulating over. If this value is <code>any_feature</code>, that means we aggregate the Tweet count. For example <code>user_aggregate_v2.pair.recap.engagement.is_favorited.any_feature.50.days.count</code> will be the number of Liked records for every user over the last 50 days.
<li> The <b>Aggregation Spec</b> is what aggregate to compute - what function and over what time window.
</ul>
For every Feature Group, we generate one feature for every possible combination of Engagement Scope, Feature To Aggregate, and Aggregation Spec. In particular, every row in the below tables generate one feature for every possible cross between columns.
<b>Example</b>:
For example, one such feature may be <code>user_aggregate_v2.pair.recap.engagement.is_favorited.engagement_features.in_network.replies.count.50.days.count</code>, which can be parsed into
<table>
<tr>
<td><b>Feature Group Name</b></td>
<td><b>Engagement Scope</b></td>
<td><b>Feature To Aggregate</b></td>
<td><b>Aggregation Spec</b></td>
</tr>
<tr>
<td><code>user_aggregate_v2.pair</code></td>
<td><code>recap.engagement.is_favorited</code></td>
<td><code>engagement_features.in_network.replies.count</code></td>
<td><code>50.days.count</code></td>
</tr>
</table>
This means that this feature aggregates
<ol>
<li> (Over every user),
<li> (Over only tweets favorited by the user),
<li> In network replies sent out by this user,
<li> (Counted over the last 50 days)
</ol>
This feature is then made available as a feature for the particular user.
</details>
The list of our aggregate features are below:
<details>
<summary><b><code>author_aggregate</code></b></summary>
These features aggregate over the author (or original author) of a tweet. Some of the features are short-duration (30 minutes) and some longer (50 days). The features track how many of an author's tweets were engaged with.
<br>
<table>
<tr>
<td>
<code>
author (real_time)
</code>
</td>
<td>
<code>
timelines.enagagement.is_retweeted_without_quote <br>
timelines.engagement.is_clicked <br>
timelines.engagement.is_dont_like <br>
timelines.engagement.is_dwelled <br>
timelines.engagement.is_favorited <br>
timelines.engagement.is_followed <br>
timelines.engagement.is_open_linked <br>
timelines.engagement.is_photo_expanded <br>
timelines.engagement.is_profile_clicked <br>
timelines.engagement.is_quoted <br>
timelines.engagement.is_replied <br>
timelines.engagement.is_retweeted <br>
timelines.engagement.is_tweet_share_dm_clicked <br>
timelines.engagement.is_tweet_share_dm_sent <br>
timelines.engagement.is_video_playback_50 <br>
timelines.engagement.is_video_quality_viewed <br>
timelines.engagement.is_video_viewed <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
<tr>
<td>
<code>
original_author (real_time)
</code>
</td>
<td>
<code>
timelines.enagagement.is_retweeted_without_quote <br>
timelines.engagement.is_clicked <br>
timelines.engagement.is_dont_like <br>
timelines.engagement.is_dwelled <br>
timelines.engagement.is_favorited <br>
timelines.engagement.is_followed <br>
timelines.engagement.is_open_linked <br>
timelines.engagement.is_photo_expanded <br>
timelines.engagement.is_profile_clicked <br>
timelines.engagement.is_quoted <br>
timelines.engagement.is_replied <br>
timelines.engagement.is_retweeted <br>
timelines.engagement.is_tweet_share_dm_clicked <br>
timelines.engagement.is_tweet_share_dm_sent <br>
timelines.engagement.is_video_playback_50 <br>
timelines.engagement.is_video_quality_viewed <br>
timelines.engagement.is_video_viewed <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
<tr>
<td>
<code>
original_author (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_share_menu_clicked <br>
timelines.engagement.is_shared <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count <br>
1.days.count <br>
</code>
</td>
</tr>
<tr>
<td>
<code>
original_author
</code>
</td>
<td>
<code>
recap.engagement.is_replied_reply_favorited_by_author <br>
recap.engagement.is_replied_reply_impressed_by_author <br>
recap.engagement.is_replied_reply_replied_by_author <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>author-topic_aggregate</code></b></summary>
These features aggregate over a specific tweet author and a specific topic. We only accumulate long (50 day) counts.
<br>
<table>
<tr>
<td>
<code>
author-topic
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>list_aggregate</code></b></summary>
These features aggregate short term and long term engagement between a user and a list.
<br>
<table>
<tr>
<td>
<code>
user_list
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
list (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_block_clicked <br>
timelines.engagement.is_dont_like <br>
timelines.engagement.is_dwelled <br>
timelines.engagement.is_favorited <br>
timelines.engagement.is_mute_clicked <br>
timelines.engagement.is_replied <br>
timelines.engagement.is_report_tweet_clicked <br>
timelines.engagement.is_retweeted <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_aggregate</code></b></summary>
These features aggregate short term and long term engagement from a specific user.
<br>
<table>
<tr>
<td>
<code>
user_v2
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_favorited <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
</code>
</td>
<td>
<code>
any_feature <br>
engagement_features.in_network.favorites.count <br>
engagement_features.in_network.replies.count <br>
engagement_features.in_network.retweets.count <br>
realgraph.num_favorites.days_since_last <br>
realgraph.num_favorites.elapsed_days <br>
realgraph.num_favorites.ewma <br>
realgraph.num_favorites.non_zero_days <br>
realgraph.num_inspected_tweets.days_since_last <br>
realgraph.num_inspected_tweets.elapsed_days <br>
realgraph.num_inspected_tweets.ewma <br>
realgraph.num_inspected_tweets.non_zero_days <br>
realgraph.num_mentions.days_since_last <br>
realgraph.num_mentions.elapsed_days <br>
realgraph.num_mentions.ewma <br>
realgraph.num_mentions.non_zero_days <br>
realgraph.num_profile_views.days_since_last <br>
realgraph.num_profile_views.elapsed_days <br>
realgraph.num_profile_views.ewma <br>
realgraph.num_profile_views.non_zero_days <br>
realgraph.num_retweets.days_since_last <br>
realgraph.num_retweets.elapsed_days <br>
realgraph.num_retweets.ewma <br>
realgraph.num_retweets.non_zero_days <br>
realgraph.num_tweet_clicks.days_since_last <br>
realgraph.num_tweet_clicks.elapsed_days <br>
realgraph.num_tweet_clicks.ewma <br>
realgraph.num_tweet_clicks.non_zero_days <br>
realgraph.total_dwell_time.days_since_last <br>
realgraph.total_dwell_time.elapsed_days <br>
realgraph.total_dwell_time.ewma <br>
realgraph.total_dwell_time.non_zero_days <br>
recap.earlybird.fav_count_v2 <br>
recap.earlybird.reply_count_v2 <br>
recap.earlybird.retweet_count_v2 <br>
recap.searchfeature.blender_score <br>
recap.searchfeature.fav_count <br>
recap.searchfeature.reply_count <br>
recap.searchfeature.retweet_count <br>
recap.searchfeature.text_score <br>
recap.tweetfeature.bidirectional_fav_count <br>
recap.tweetfeature.bidirectional_reply_count <br>
recap.tweetfeature.bidirectional_retweet_count <br>
recap.tweetfeature.contains_media <br>
recap.tweetfeature.conversational_count <br>
recap.tweetfeature.embeds_impression_count <br>
recap.tweetfeature.embeds_url_count <br>
recap.tweetfeature.from_mutual_follow <br>
recap.tweetfeature.has_card <br>
recap.tweetfeature.has_image <br>
recap.tweetfeature.has_link <br>
recap.tweetfeature.has_multiple_media <br>
recap.tweetfeature.has_news <br>
recap.tweetfeature.has_periscope <br>
recap.tweetfeature.has_pro_video <br>
recap.tweetfeature.has_trend <br>
recap.tweetfeature.has_video <br>
recap.tweetfeature.has_vine <br>
recap.tweetfeature.has_visible_link <br>
recap.tweetfeature.is_business_score <br>
recap.tweetfeature.is_extended_reply <br>
recap.tweetfeature.is_reply <br>
recap.tweetfeature.is_retweet <br>
recap.tweetfeature.is_sensitive <br>
recap.tweetfeature.link_count <br>
recap.tweetfeature.link_language <br>
recap.tweetfeature.match_searcher_langs <br>
recap.tweetfeature.match_searcher_main_lang <br>
recap.tweetfeature.match_ui_lang <br>
recap.tweetfeature.mention_searcher <br>
recap.tweetfeature.num_hashtags <br>
recap.tweetfeature.num_mentions <br>
recap.tweetfeature.reply_other <br>
recap.tweetfeature.reply_searcher <br>
recap.tweetfeature.retweet_other <br>
recap.tweetfeature.retweet_searcher <br>
recap.tweetfeature.tweet_count_from_user_in_snapshot <br>
recap.tweetfeature.unidirectiona_fav_count <br>
recap.tweetfeature.unidirectional_reply_count <br>
recap.tweetfeature.unidirectional_retweet_count <br>
recap.tweetfeature.user_rep <br>
recap.tweetfeature.video_view_count <br>
</code>
</td>
<td>
<code>
50.days.count<br>
50.days.sum<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_v5
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked<br>
recap.engagement.is_favorited<br>
recap.engagement.is_open_linked<br>
recap.engagement.is_photo_expanded<br>
recap.engagement.is_profile_clicked<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
recap.engagement.is_video_playback_50<br>
</code>
</td>
<td>
<code>
any_feature <br>
time_features.earlybird.last_favorite_since_creation_hrs<br>
time_features.earlybird.last_quote_since_creation_hrs<br>
time_features.earlybird.last_reply_since_creation_hrs<br>
time_features.earlybird.last_retweet_since_creation_hrs<br>
time_features.earlybird.time_since_last_favorite<br>
time_features.earlybird.time_since_last_quote<br>
time_features.earlybird.time_since_last_reply<br>
time_features.earlybird.time_since_last_retweet<br>
timelines.earlybird.decayed_favorite_count<br>
timelines.earlybird.decayed_quote_count<br>
timelines.earlybird.decayed_reply_count<br>
timelines.earlybird.decayed_retweet_count<br>
timelines.earlybird.embeds_impression_count_v2<br>
timelines.earlybird.embeds_url_count_v2<br>
timelines.earlybird.fake_favorite_count<br>
timelines.earlybird.fake_quote_count<br>
timelines.earlybird.fake_reply_count<br>
timelines.earlybird.fake_retweet_count<br>
timelines.earlybird.quote_count<br>
timelines.earlybird.visible_token_ratio<br>
timelines.earlybird.weighted_fav_count<br>
timelines.earlybird.weighted_quote_count<br>
timelines.earlybird.weighted_reply_count<br>
timelines.earlybird.weighted_retweet_count<br>
</code>
</td>
<td>
<code>
50.days.count<br>
50.days.sum<br>
50.days.sumsq<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_v6
</code>
</td>
<td>
<code>
recap.engagement.is_replied_reply_favorited_by_author<br>
recap.engagement.is_replied_reply_impressed_by_author<br>
recap.engagement.is_replied_reply_replied_by_author<br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (twitter_wide)
</code>
</td>
<td>
<code>
any_label<br>
recap.engagement.is_favorited<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
</code>
</td>
<td>
<code>
any_feature <br>
recap.tweetfeature.contains_media<br>
recap.tweetfeature.has_card<br>
recap.tweetfeature.has_hashtag<br>
recap.tweetfeature.has_link<br>
recap.tweetfeature.has_mention<br>
recap.tweetfeature.is_reply<br>
timelines.earlybird.has_quote<br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (real_time)
</code>
</td>
<td>
<code>
timelines.enagagement.is_retweeted_without_quote<br>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_followed<br>
timelines.engagement.is_open_linked<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_quoted<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_tweet_share_dm_clicked<br>
timelines.engagement.is_tweet_share_dm_sent<br>
timelines.engagement.is_video_playback_50<br>
timelines.engagement.is_video_quality_viewed<br>
timelines.engagement.is_video_viewed<br>
</code>
</td>
<td>
<code>
any_feature <br>
client_log_event.tweet.has_consumer_video<br>
client_log_event.tweet.photo_count<br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (48h_real_time_v5)
</code>
</td>
<td>
<code>
timelines.enagagement.is_retweeted_without_quote<br>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_followed<br>
timelines.engagement.is_open_linked<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_quoted<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_tweet_share_dm_clicked<br>
timelines.engagement.is_tweet_share_dm_sent<br>
timelines.engagement.is_video_playback_50<br>
timelines.engagement.is_video_quality_viewed<br>
timelines.engagement.is_video_viewed<br>
</code>
</td>
<td>
<code>
any_feature <br>
client_log_event.tweet.has_consumer_video<br>
client_log_event.tweet.photo_count<br>
</code>
</td>
<td>
<code>
2.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (72h_real_time_v6)
</code>
</td>
<td>
<code>
timelines.engagement.is_block_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_mute_clicked<br>
timelines.engagement.is_report_tweet_clicked<br>
</code>
</td>
<td>
<code>
timelines.author.user_state.is_user_heavy_non_tweeter<br>
timelines.author.user_state.is_user_heavy_tweeter<br>
timelines.author.user_state.is_user_light<br>
timelines.author.user_state.is_user_medium_non_tweeter<br>
timelines.author.user_state.is_user_medium_tweeter<br>
timelines.author.user_state.is_user_new<br>
</code>
</td>
<td>
<code>
3.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (profile_real_time_v6)
</code>
</td>
<td>
<code>
profile.engagement.is_clicked<br>
profile.engagement.is_dwelled<br>
profile.engagement.is_favorited<br>
profile.engagement.is_replied<br>
profile.engagement.is_retweeted<br>
</code>
</td>
<td>
<code>
any_feature <br>
client_log_event.tweet.has_consumer_video<br>
client_log_event.tweet.photo_count<br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_share_menu_clicked<br>
timelines.engagement.is_shared <br>
</code>
</td>
<td>
<code>
any_feature <br>
client_log_event.tweet.has_consumer_video<br>
client_log_event.tweet.photo_count<br>
</code>
</td>
<td>
<code>
1.days.count<br>
30.minutes.count<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_fullscreen_video_dwelled<br>
timelines.engagement.is_fullscreen_video_dwelled_10_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_20_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_30_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_5_sec<br>
timelines.engagement.is_profile_dwelled<br>
timelines.engagement.is_profile_dwelled_10_sec<br>
timelines.engagement.is_profile_dwelled_20_sec<br>
timelines.engagement.is_profile_dwelled_30_sec<br>
timelines.engagement.is_tweet_detail_dwelled<br>
timelines.engagement.is_tweet_detail_dwelled_15_sec<br>
timelines.engagement.is_tweet_detail_dwelled_25_sec<br>
timelines.engagement.is_tweet_detail_dwelled_30_sec<br>
timelines.engagement.is_tweet_detail_dwelled_8_sec<br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
1.days.count<br>
30.minutes.count<br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_author_aggregate</code></b></summary>
These features aggregate over user-author pairs.
<br>
<table>
<tr>
<td>
<code>
user_author_v2
</code>
</td>
<td>
<code>
any_label<br>
recap.engagement.is_clicked<br>
recap.engagement.is_favorited<br>
recap.engagement.is_open_linked<br>
recap.engagement.is_photo_expanded<br>
recap.engagement.is_profile_clicked<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
recap.engagement.is_video_playback_50<br>
</code>
</td>
<td>
<code>
engagement_features.in_network.favorites.count<br>
engagement_features.in_network.replies.count<br>
engagement_features.in_network.retweets.count<br>
recap.earlybird.fav_count_v2<br>
recap.earlybird.reply_count_v2<br>
recap.earlybird.retweet_count_v2<br>
recap.searchfeature.blender_score<br>
recap.searchfeature.fav_count<br>
recap.searchfeature.reply_count<br>
recap.searchfeature.retweet_count<br>
recap.searchfeature.text_score<br>
recap.tweetfeature.embeds_impression_count<br>
recap.tweetfeature.embeds_url_count<br>
recap.tweetfeature.has_card<br>
recap.tweetfeature.has_image<br>
recap.tweetfeature.has_link<br>
recap.tweetfeature.has_multiple_media<br>
recap.tweetfeature.has_news<br>
recap.tweetfeature.has_periscope<br>
recap.tweetfeature.has_pro_video<br>
recap.tweetfeature.has_trend<br>
recap.tweetfeature.has_video<br>
recap.tweetfeature.has_vine<br>
recap.tweetfeature.has_visible_link<br>
recap.tweetfeature.is_reply<br>
recap.tweetfeature.is_retweet<br>
recap.tweetfeature.num_mentions<br>
</code>
</td>
<td>
<code>
50.days.count<br>
50.days.sum<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_author_v5
</code>
</td>
<td>
<code>
any_label<br>
recap.engagement.is_clicked<br>
recap.engagement.is_favorited<br>
recap.engagement.is_open_linked<br>
recap.engagement.is_photo_expanded<br>
recap.engagement.is_profile_clicked<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
recap.engagement.is_video_playback_50<br>
</code>
</td>
<td>
<code>
any_feature<br>
timelines.earlybird.has_quote<br>
timelines.earlybird.label_abusive_flag<br>
timelines.earlybird.label_abusive_hi_rcl_flag<br>
timelines.earlybird.label_dup_content_flag<br>
timelines.earlybird.label_nsfw_hi_prc_flag<br>
timelines.earlybird.label_nsfw_hi_rcl_flag<br>
timelines.earlybird.label_spam_flag<br>
timelines.earlybird.label_spam_hi_rcl_flag<br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user_author (tweetsource_v1 - <br>
These features are sourced from a different underlying dataset)
</code>
</td>
<td>
<code>
any_label<br>
recap.engagement.is_clicked<br>
recap.engagement.is_favorited<br>
recap.engagement.is_open_linked<br>
recap.engagement.is_photo_expanded<br>
recap.engagement.is_profile_clicked<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
recap.engagement.is_video_playback_50<br>
</code>
</td>
<td>
<code>
any_feature<br>
tweetsource.tweet.media.num_tags<br>
tweetsource.tweet.media.video_duration<br>
tweetsource.tweet.text.has_question<br>
tweetsource.tweet.text.length<br>
</code>
</td>
<td>
<code>
50.days.count<br>
50.days.sum<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_author (twitter_wide - <br>
These features are sourced from a different underlying dataset)
</code>
</td>
<td>
<code>
recap.engagement.is_favorited<br>
recap.engagement.is_replied<br>
recap.engagement.is_retweeted<br>
</code>
</td>
<td>
<code>
any_feature <br>
recap.tweetfeature.contains_media<br>
recap.tweetfeature.has_card<br>
recap.tweetfeature.has_hashtag<br>
recap.tweetfeature.has_link<br>
recap.tweetfeature.has_mention<br>
recap.tweetfeature.is_reply<br>
timelines.earlybird.has_quote<br>
</code>
</td>
<td>
<code>
50.days.count<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_original_author (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_shared<br>
</code>
</td>
<td>
<code>
any_feature<br>
</code>
</td>
<td>
<code>
1.days.count<br>
30.minutes.count<br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_original_author
</code>
</td>
<td>
<code>
recap.engagement.is_replied_reply_favorited_by_author<br>
recap.engagement.is_replied_reply_impressed_by_author<br>
recap.engagement.is_replied_reply_replied_by_author<br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user_author (real_time, shared)
</code>
<td>
<code>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_negative_feedback_union<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_share_menu_clicked<br>
timelines.engagement.is_video_playback_50
</code>
</td>
<td>
<code>
any_feature
</code>
</td>
<td>
<code>
1.days.count<br>
30.minutes.count
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_engager_aggregate</code></b></summary>
These features aggregate counts of user interaction with other engagers of tweets that the user interacts with.
For example, the <code>user_engager.recap.engagement.is_favorited.any_feature.50.days.count.sparse_top1</code> feature can be parsed as follows:
For all tweets that a user Likes, accumulate a running count over 50 days where the number of engagement events for every other user who has engaged with the Tweet is accumulated. Engagement is defined as Like or reply. We now have a list of engagement counts for other users that have engaged with the Tweets that the user has Liked, and we take the top count as the feature value.
<br>
<table>
<tr>
<td>
<code>
user_engager <br>
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count.sparse_mean <br>
50.days.count.sparse_nonzero <br>
50.days.count.sparse_sum <br>
50.days.count.sparse_top1 <br>
50.days.count.sparse_top2 <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_inferred_topic_aggregate</code></b></summary>
These features aggregate short term and long term engagement between a user and tweets from our internally predicted inferred topic (whether or not the tweet is actually tagged to that topic).
<br>
<table>
<tr>
<td>
<code>
user_inferred_topic_v1
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count.sparse_mean <br>
50.days.count.sparse_nonzero <br>
50.days.count.sparse_sum <br>
50.days.count.sparse_top1 <br>
50.days.count.sparse_top2 <br>
</code>
</td>
</tr>
<tr>
<td>
<code>
user_inferred_topic_v2
</code>
</td>
<td>
<code>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
engagement_features.in_network.favorites.count <br>
engagement_features.in_network.retweets.count <br>
recap.searchfeature.fav_count <br>
recap.tweetfeature.contains_media <br>
recap.tweetfeature.has_card <br>
recap.tweetfeature.has_image <br>
recap.tweetfeature.has_link <br>
recap.tweetfeature.has_news <br>
recap.tweetfeature.has_trend <br>
recap.tweetfeature.has_video <br>
recap.tweetfeature.is_reply <br>
recap.tweetfeature.is_retweet <br>
recap.tweetfeature.is_sensitive <br>
recap.tweetfeature.match_searcher_langs <br>
recap.tweetfeature.match_searcher_main_lang <br>
recap.tweetfeature.match_ui_lang <br>
recap.tweetfeature.mention_searcher <br>
recap.tweetfeature.reply_other <br>
recap.tweetfeature.reply_searcher <br>
recap.tweetfeature.retweet_other <br>
recap.tweetfeature.retweet_searcher <br>
tweetsource.tweet.media.aspect_ratio_den <br>
tweetsource.tweet.text.num_caps <br>
tweetsource.tweet.text.num_newlines <br>
tweetsource.v2.tweet.media.has_description <br>
tweetsource.v2.tweet.media.has_selected_preview_image <br>
tweetsource.v2.tweet.media.has_title <br>
tweetsource.v2.tweet.media.has_visit_site_call_to_action <br>
tweetsource.v2.tweet.media.has_watch_now_call_to_action <br>
tweetsource.v2.tweet.media.is_360 <br>
tweetsource.v2.tweet.media.is_managed <br>
tweetsource.v2.tweet.media.is_monetizable <br>
</code>
</td>
<td>
<code>
50.days.count.sparse_mean <br>
50.days.count.sparse_nonzero <br>
50.days.count.sparse_sum <br>
50.days.count.sparse_top1 <br>
50.days.count.sparse_top2 <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_media_annotation_aggregate</code></b></summary>
These features aggregate how often a user interacts with different types of media (photo, video, etc)
<br>
<table>
<tr>
<td>
<code>
user_media_annotation
(keyed by user and media type)
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature
</code>
</td>
<td>
<code>
50.days.count.sparse_mean <br>
50.days.count.sparse_nonzero <br>
50.days.count.sparse_sum <br>
50.days.count.sparse_top1 <br>
50.days.count.sparse_top2 <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_mention_aggregate</code></b></summary>
These features aggregate counts of user interactions with Tweets that mention other users.
Let the original user who viewed a Tweet be <code>user1</code>, and let <code>user2, user3, ..., user_n</code> be users mentioned in a tweet. This feature group aggregates the interactions between <code>user1</code> and other Tweets that mention <code>user2, user3,..., user_n</code>.
Here <code>sparse_sum</code> means we sum the aggregate values over all mentioned users, <code>sparse_top1</code> means we take the max of the aggregate values for the mentioned authors, <code>sparse_top1</code> means we take the second-highest of the aggregate values for the mentioned authors, and so on.
<br>
<table>
<tr>
<td>
<code>
user_mention <br>
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 any_feature.50.days.count <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count.sparse_mean <br>
50.days.count.sparse_nonzero <br>
50.days.count.sparse_sum <br>
50.days.count.sparse_top1 <br>
50.days.count.sparse_top2 <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_request_context_aggregate</code></b></summary>
These features aggregate engagements over the request context, which is either the same day of week (dow) or hour of day (hour), to account for temporal effects.
<br>
<table>
<tr>
<td>
<code>
dow <br>
</code>
</td>
<td>
<code>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count <br>
</code>
</td>
</tr>
<tr>
<td>
<code>
hour <br>
</code>
</td>
<td>
<code>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_topic_aggregate</code></b></summary>
These features aggregate long term feature values between a user and tweets from a particular topic.
<br>
<table>
<tr>
<td>
<code>
user_topic_v1
</code>
</td>
<td>
<code>
any_label <br>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
<tr>
<td>
<code>
user_topic_v2
</code>
</td>
<td>
<code>
recap.engagement.is_clicked <br>
recap.engagement.is_favorited <br>
recap.engagement.is_open_linked <br>
recap.engagement.is_photo_expanded <br>
recap.engagement.is_profile_clicked <br>
recap.engagement.is_replied <br>
recap.engagement.is_retweeted <br>
recap.engagement.is_video_playback_50 <br>
</code>
</td>
<td>
<code>
engagement_features.in_network.favorites.count <br>
engagement_features.in_network.retweets.count <br>
recap.searchfeature.fav_count <br>
recap.tweetfeature.contains_media <br>
recap.tweetfeature.has_card <br>
recap.tweetfeature.has_image <br>
recap.tweetfeature.has_link <br>
recap.tweetfeature.has_news <br>
recap.tweetfeature.has_trend <br>
recap.tweetfeature.has_video <br>
recap.tweetfeature.is_reply <br>
recap.tweetfeature.is_retweet <br>
recap.tweetfeature.is_sensitive <br>
recap.tweetfeature.match_searcher_langs <br>
recap.tweetfeature.match_searcher_main_lang <br>
recap.tweetfeature.match_ui_lang <br>
recap.tweetfeature.mention_searcher <br>
recap.tweetfeature.reply_other <br>
recap.tweetfeature.reply_searcher <br>
recap.tweetfeature.retweet_other <br>
recap.tweetfeature.retweet_searcher <br>
tweetsource.tweet.media.aspect_ratio_den <br>
tweetsource.tweet.text.num_caps <br>
tweetsource.tweet.text.num_newlines <br>
tweetsource.v2.tweet.media.has_description <br>
tweetsource.v2.tweet.media.has_selected_preview_image <br>
tweetsource.v2.tweet.media.has_title <br>
tweetsource.v2.tweet.media.has_visit_site_call_to_action <br>
tweetsource.v2.tweet.media.has_watch_now_call_to_action <br>
tweetsource.v2.tweet.media.is_360 <br>
tweetsource.v2.tweet.media.is_managed <br>
tweetsource.v2.tweet.media.is_monetizable <br>
</code>
</td>
<td>
<code>
50.days.count
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>topic_aggregate</code></b></summary>
These features aggregate values for tweets that come from a particular topic.
<br>
<table>
<tr>
<td>
<code>
topic (real_time)
</code>
</td>
<td>
<code>
timelines.enagagement.is_retweeted_without_quote <br>
timelines.engagement.is_clicked <br>
timelines.engagement.is_dont_like <br>
timelines.engagement.is_dwelled <br>
timelines.engagement.is_favorited <br>
timelines.engagement.is_followed <br>
timelines.engagement.is_not_interested_in_topic <br>
timelines.engagement.is_open_linked <br>
timelines.engagement.is_photo_expanded <br>
timelines.engagement.is_profile_clicked <br>
timelines.engagement.is_quoted <br>
timelines.engagement.is_replied <br>
timelines.engagement.is_retweeted <br>
timelines.engagement.is_tweet_share_dm_clicked <br>
timelines.engagement.is_tweet_share_dm_sent <br>
timelines.engagement.is_video_playback_50 <br>
timelines.engagement.is_video_quality_viewed <br>
timelines.engagement.is_video_viewed <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count
</code>
</td>
</tr>
<tr>
<td>
<code>
topic (24_hour_real_time)
</code>
</td>
<td>
<code>timelines.enagagement.is_retweeted_without_quote<br>
timelines.engagement.is_block_clicked<br>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_followed<br>
timelines.engagement.is_mute_clicked<br>
timelines.engagement.is_not_about_topic<br>
timelines.engagement.is_not_interested_in_topic<br>
timelines.engagement.is_not_recent<br>
timelines.engagement.is_not_relevant<br>
timelines.engagement.is_open_linked<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_quoted<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_report_tweet_clicked<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_see_fewer<br>
timelines.engagement.is_tweet_share_dm_clicked<br>
timelines.engagement.is_tweet_share_dm_sent<br>
timelines.engagement.is_unfollow_topic<br>
timelines.engagement.is_video_playback_50<br>
timelines.engagement.is_video_quality_viewed<br>
timelines.engagement.is_video_viewed
</code></td>
<td><code>any_feature</code></td>
<td><code>1.days.count</code></td>
</tr>
<tr>
<td>
<code>
topic-country_code (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_block_clicked<br>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_impressed<br>
timelines.engagement.is_mute_clicked<br>
timelines.engagement.is_not_about_topic<br>
timelines.engagement.is_not_interested_in_topic<br>
timelines.engagement.is_not_recent<br>
timelines.engagement.is_not_relevant<br>
timelines.engagement.is_open_linked<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_report_tweet_clicked<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_see_fewer<br>
timelines.engagement.is_share_menu_clicked<br>
timelines.engagement.is_shared<br>
timelines.engagement.is_unfollow_topic<br>
timelines.engagement.is_video_playback_50<br>
timelines.engagement.is_video_quality_viewed
</code>
</td>
<td><code>any_feature</code></td>
<td><code>3.days.count<br>30.minutes.count</code></td>
</tr>
<tr>
<td>
<code>
topic-share (real_time)
</code>
</td>
<td>
<code>
timelines.engagement.is_share_menu_clicked<br>
timelines.engagement.is_shared
</code>
</td>
<td><code>any_feature</code></td>
<td><code>1.days.count<br>30.minutes.count</code></td>
</tr>
</table>
</details>
<details>
<summary><b><code>tweet_aggregate</code></b></summary>
These features aggregate values corresponding to a tweet.
<br>
<table>
<tr>
<td><code>tweet (real_time)</code></td>
<td><code>
timelines.enagagement.is_retweeted_without_quote<br>
timelines.engagement.is_clicked<br>
timelines.engagement.is_dont_like<br>
timelines.engagement.is_dwelled<br>
timelines.engagement.is_favorited<br>
timelines.engagement.is_followed<br>
timelines.engagement.is_open_linked<br>
timelines.engagement.is_photo_expanded<br>
timelines.engagement.is_profile_clicked<br>
timelines.engagement.is_quoted<br>
timelines.engagement.is_replied<br>
timelines.engagement.is_retweeted<br>
timelines.engagement.is_tweet_share_dm_clicked<br>
timelines.engagement.is_tweet_share_dm_sent<br>
timelines.engagement.is_video_playback_50<br>
timelines.engagement.is_video_quality_viewed<br>
timelines.engagement.is_video_viewed
</code>
</td>
<td><code>any_feature</code></td>
<td>
<code>
30.minutes.count<br>
Duration.Top.count
</code>
</td>
</tr>
<tr>
<td><code>tweet_v2 (real_time)</code></td>
<td>
<code>
timelines.engagement.is_block_clicked <br>
timelines.engagement.is_mute_clicked <br>
timelines.engagement.is_report_tweet_clicked <br>
</code>
</td>
<td>
<code>
any_feature <br>
</code>
</td>
<td>
<code>
30.minutes.count <br>
Duration.Top.count <br>
</code>
</td>
</tr>
<tr>
<td><code>tweet (real_time dwell) </code></td>
<td><code>timelines.engagement.is_fullscreen_video_dwelled<br>
timelines.engagement.is_fullscreen_video_dwelled_10_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_20_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_30_sec<br>
timelines.engagement.is_fullscreen_video_dwelled_5_sec<br>
timelines.engagement.is_profile_dwelled<br>
timelines.engagement.is_profile_dwelled_10_sec<br>
timelines.engagement.is_profile_dwelled_20_sec<br>
timelines.engagement.is_profile_dwelled_30_sec<br>
timelines.engagement.is_tweet_detail_dwelled<br>
timelines.engagement.is_tweet_detail_dwelled_15_sec<br>
timelines.engagement.is_tweet_detail_dwelled_25_sec<br>
timelines.engagement.is_tweet_detail_dwelled_30_sec<br>
timelines.engagement.is_tweet_detail_dwelled_8_sec</code></td>
<td>
<code>any_feature
</code>
</td>
<td><code>1.days.count<br>30.minutes.count</code></td>
</tr>
<tr>
<td><code>tweet (real_time shared) </code></td>
<td>
<code>
timelines.engagement.is_share_menu_clicked<br>
timelines.engagement.is_shared
</code>
</td>
<td><code>any_feature</code></td>
<td><code>1.days.count<br>30.minutes.count</code></td>
</tr>
</table>
</details>
## Non Aggregate Features
We have a number of standalone features capturing information about the user, the tweet, the author, and the tweet context.
<details>
<summary><b><code>two_hop</code></b></summary>
<br>
This feature group contains features about interactions which are "two-hop" between a user and the tweet author. Examples of two-top interactions are: If user 1</code> favorites a tweet by user 2, and user 2 favorites a tweet by user 3, there will be a positive value for the "favorite.favorited_by" two-hop feature between user 1 and user 3.
The feature group consists of all possible crosses of the below features.
<table>
<tr>
<td>
<code>
two_hop
</code>
</td>
<td>
<code>
favorite <br>
following <br>
mutual_follow <br>
</code>
</td>
<td>
<code>
favorited_by <br>
followed_by <br>
mentioned_by <br>
retweeted_by <br>
</code>
</td>
<td>
<code>
normalized
</code>
</td>
</tr>
<tr>
<td>
<code>
two_hop
</code>
</td>
<td>
<code>
</code>
</td>
<td>
<code>
favorited_by <br>
favorited_by <br>
mentioned_by <br>
retweeted_by
</code>
</td>
<td>
<code>
right_degree
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>realgraph</code></b></summary>
<br>
This feature group contains features about interactions between the user and the Tweet author.
The feature group consists of all possible crosses of the below features.
<table>
<tr>
<td>
<code>
realgraph
</code>
</td>
<td>
<code>
dst_id <br>
src_id <br>
</code>
</td>
<td>
<code>
</code>
</td>
</tr>
<tr>
<td>
<code>
realgraph
</code>
</td>
<td>
<code>
num_address_book_email <br>
num_address_book_in_both <br>
num_address_book_mutual_edge_email <br>
num_address_book_mutual_edge_in_both <br>
num_address_book_mutual_edge_phone <br>
num_address_book_phone<br>
num_blocks<br>
num_direct_messages<br>
num_favorites<br>
num_follow<br>
num_inspected_tweets<br>
num_link_clicks<br>
num_mentions<br>
num_mutes<br>
num_mutual_follow<br>
num_photo_tags<br>
num_profile_views<br>
num_report_as_abuses<br>
num_report_as_spams<br>
num_retweets<br>
num_sms_follow<br>
num_tweet_clicks<br>
total_dwell_time<br>
weight
</code>
</td>
<td>
<code>
days_since_last <br>
days_since_last.sparse_avg <br>
days_since_last.sparse_max <br>
days_since_last.sparse_sum <br>
elapsed_days <br>
elapsed_days.sparse_avg <br>
elapsed_days.sparse_max<br>
elapsed_days.sparse_sum<br>
ewma<br>
ewma.sparse_avg<br>
ewma.sparse_max<br>
ewma.sparse_sum<br>
is_missing<br>
m2ForVariance.sparse_avg<br>
m2ForVariance.sparse_max<br>
m2ForVariance.sparse_sum<br>
mean<br>
mean.sparse_avg<br>
mean.sparse_max<br>
mean.sparse_sum<br>
non_zero_days<br>
non_zero_days.sparse_avg<br>
non_zero_days.sparse_max<br>
non_zero_days.sparse_sum<br>
sparse_avg<br>
sparse_max<br>
sparse_sum<br>
variance
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>authors.realgraph</code></b></summary>
This feature group contains features about interactions between the user and various other users including
<ol>
<li> the Tweet author
<li> any users mentioned in the Tweet
<li> in-network engagers with the Tweet
<li> upstream authors if the Tweet was part of a reply chain
</ol>
Note that all the above users are included in the interaction set, not just the Tweet author.
The feature group consists of all possible crosses of the below features.
<br>
<table>
<tr>
<td>
<code>
authors.realgraph
</code>
</td>
<td>
<code>
weight
</code>
</td>
<td>
<code>
</code>
</td>
<td>
<code>
sparse_avg <br>
sparse_max <br>
sparse_sum <br>
</code>
</td>
</tr>
<tr>
<td>
<code>
authors.realgraph
</code>
</td>
<td>
<code>
num_address_book_email <br>
num_address_book_in_both <br>
num_address_book_mutual_edge_email <br>
num_address_book_mutual_edge_in_both <br>
num_address_book_phone <br>
num_blocks <br>
num_direct_messages <br>
num_favorites <br>
num_follow <br>
num_inspected_tweets <br>
num_link_clicks <br>
num_mentions <br>
num_mutes <br>
num_mutual_follow <br>
num_photo_tags <br>
num_profile_views <br>
num_report_as_abuses <br>
num_report_as_spams <br>
num_retweets <br>
num_sms_follow <br>
num_tweet_clicks <br>
total_dwell_time <br>
</code>
</td>
<td>
<code>
days_since_last <br>
elapsed_days <br>
ewma <br>
m2ForVariance <br>
mean <br>
non_zero_days <br>
</code>
</td>
<td>
<code>
sparse_avg <br>
sparse_max <br>
sparse_sum <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>recap.tweetfeature, recap.searchfeature, etc</code></b></summary>
<br>
This feature group contains features about the tweet, whether from the tweets service or the search service ("Earlybird"). It also contains features related to the user's device type.
<table>
<tr>
<td>
<code>
recap.earlybird.fav_count_v2 <br>
recap.earlybird.reply_count_v2 <br>
recap.earlybird.retweet_count_v2 <br>
recap.searchfeature.blender_score <br>
recap.searchfeature.fav_count <br>
recap.searchfeature.reply_count <br>
recap.searchfeature.retweet_count <br>
recap.searchfeature.text_score <br>
recap.source.type <br>
recap.tweetfeature.bidirectional_fav_count <br>
recap.tweetfeature.bidirectional_reply_count <br>
recap.tweetfeature.bidirectional_retweet_count <br>
recap.tweetfeature.contains_media <br>
recap.tweetfeature.conversational_count <br>
recap.tweetfeature.embeds_impression_count <br>
recap.tweetfeature.embeds_url_count <br>
recap.tweetfeature.from_inactive_user <br>
recap.tweetfeature.from_mutual_follow <br>
recap.tweetfeature.from_verified_account <br>
recap.tweetfeature.has_card <br>
recap.tweetfeature.has_consumer_video <br>
recap.tweetfeature.has_hashtag <br>
recap.tweetfeature.has_image <br>
recap.tweetfeature.has_link <br>
recap.tweetfeature.has_mention <br>
recap.tweetfeature.has_multiple_hashtag_or_trend <br>
recap.tweetfeature.has_multiple_media <br>
recap.tweetfeature.has_native_image <br>
recap.tweetfeature.has_native_video <br>
recap.tweetfeature.has_news <br>
recap.tweetfeature.has_periscope <br>
recap.tweetfeature.has_pro_video <br>
recap.tweetfeature.has_trend <br>
recap.tweetfeature.has_video <br>
recap.tweetfeature.has_vine <br>
recap.tweetfeature.has_visible_link <br>
recap.tweetfeature.is_author_bot <br>
recap.tweetfeature.is_author_new <br>
recap.tweetfeature.is_author_profile_egg <br>
recap.tweetfeature.is_author_spam <br>
recap.tweetfeature.is_business_score <br>
recap.tweetfeature.is_extended_reply <br>
recap.tweetfeature.is_offensive <br>
recap.tweetfeature.is_reply <br>
recap.tweetfeature.is_retweet <br>
recap.tweetfeature.is_sensitive <br>
recap.tweetfeature.language <br>
recap.tweetfeature.link_count <br>
recap.tweetfeature.link_language <br>
recap.tweetfeature.match_searcher_langs <br>
recap.tweetfeature.match_searcher_main_lang <br>
recap.tweetfeature.match_ui_lang <br>
recap.tweetfeature.mention_searcher <br>
recap.tweetfeature.num_hashtags <br>
recap.tweetfeature.num_mentions <br>
recap.tweetfeature.prev_user_tweet_enagagement <br>
recap.tweetfeature.reply_other <br>
recap.tweetfeature.reply_searcher <br>
recap.tweetfeature.retweet_other <br>
recap.tweetfeature.retweet_searcher <br>
recap.tweetfeature.signature <br>
recap.tweetfeature.tweet_count_from_user_in_snapshot <br>
recap.tweetfeature.unidirectiona_fav_count <br>
recap.tweetfeature.unidirectional_reply_count <br>
recap.tweetfeature.unidirectional_retweet_count <br>
recap.tweetfeature.user_rep <br>
recap.tweetfeature.video_view_count <br>
recap.user_agent.client_name <br>
recap.user_agent.client_source <br>
recap.user_agent.client_version <br>
recap.user_agent.client_version_code <br>
recap.user_agent.device <br>
recap.user_agent.manufacturer <br>
recap.user_agent.network_connection <br>
recap.user_agent.sdk_version <br>
recap.v2.tweetfeature.is_retweet_directed_at_user_in_first_degree <br>
recap.v2.tweetfeature.is_retweet_of_reply <br>
recap.v2.tweetfeature.is_retweeter_bot <br>
recap.v2.tweetfeature.is_retweeter_new <br>
recap.v2.tweetfeature.is_retweeter_nsfw <br>
recap.v2.tweetfeature.is_retweeter_profile_egg <br>
recap.v2.tweetfeature.is_retweeter_spam <br>
recap.v2.tweetfeature.retweet_of_mutual_follow <br>
recap.v2.tweetfeature.source_author_rep <br>
recap.v3.tweetfeature.probably_from_follow
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>tweetsource</code></b></summary>
<br>
This feature group contains features about the tweet media as well as conversation-related features about the tweet.
<table>
<tr>
<td>
<code>
<br>
tweetsource.tweet.media.aspect_ratio_den <br>
tweetsource.tweet.media.aspect_ratio_num <br>
tweetsource.tweet.media.bit_rate <br>
tweetsource.tweet.media.height_1 <br>
tweetsource.tweet.media.height_2 <br>
tweetsource.tweet.media.height_3 <br>
tweetsource.tweet.media.height_4 <br>
tweetsource.tweet.media.num_tags <br>
tweetsource.tweet.media.resize_method_1 <br>
tweetsource.tweet.media.resize_method_2 <br>
tweetsource.tweet.media.resize_method_3 <br>
tweetsource.tweet.media.resize_method_4 <br>
tweetsource.tweet.media.video_duration <br>
tweetsource.tweet.media.width_1 <br>
tweetsource.tweet.media.width_2 <br>
tweetsource.tweet.media.width_3 <br>
tweetsource.tweet.media.width_4 <br>
tweetsource.tweet.text.has_question <br>
tweetsource.tweet.text.length <br>
tweetsource.tweet.text.length_type <br>
tweetsource.tweet.text.num_caps <br>
tweetsource.tweet.text.num_newlines <br>
tweetsource.tweet.text.num_whitespaces <br>
tweetsource.v2.tweet.media.color_1_blue <br>
tweetsource.v2.tweet.media.color_1_green <br>
tweetsource.v2.tweet.media.color_1_percentage <br>
tweetsource.v2.tweet.media.color_1_red <br>
tweetsource.v2.tweet.media.face_areas <br>
tweetsource.v2.tweet.media.has_app_install_call_to_action <br>
tweetsource.v2.tweet.media.has_description <br>
tweetsource.v2.tweet.media.has_selected_preview_image <br>
tweetsource.v2.tweet.media.has_title <br>
tweetsource.v2.tweet.media.has_visit_site_call_to_action <br>
tweetsource.v2.tweet.media.has_watch_now_call_to_action <br>
tweetsource.v2.tweet.media.is_360 <br>
tweetsource.v2.tweet.media.is_embeddable <br>
tweetsource.v2.tweet.media.is_managed <br>
tweetsource.v2.tweet.media.is_monetizable <br>
tweetsource.v2.tweet.media.num_color_pallette_items <br>
tweetsource.v2.tweet.media.num_faces <br>
tweetsource.v2.tweet.media.num_stickers <br>
tweetsource.v2.tweet.media.view_count <br>
</td>
</tr>
</table>
</code>
</details>
<details>
<summary><b><code>in_reply_to_tweet</code></b></summary>
<br>
If the tweet was a reply, this feature group contains the features of the replied to tweet.
<table>
<tr>
<td>
<code>
in_reply_to_tweet.recap.earlybird.fav_count_v2 <br>
in_reply_to_tweet.recap.earlybird.reply_count_v2 <br>
in_reply_to_tweet.recap.earlybird.retweet_count_v2 <br>
in_reply_to_tweet.recap.searchfeature.fav_count <br>
in_reply_to_tweet.recap.searchfeature.reply_count <br>
in_reply_to_tweet.recap.searchfeature.retweet_count <br>
in_reply_to_tweet.recap.searchfeature.text_score <br>
in_reply_to_tweet.recap.tweetfeature.bidirectional_fav_count <br>
in_reply_to_tweet.recap.tweetfeature.bidirectional_reply_count <br>
in_reply_to_tweet.recap.tweetfeature.bidirectional_retweet_count <br>
in_reply_to_tweet.recap.tweetfeature.conversational_count <br>
in_reply_to_tweet.recap.tweetfeature.from_mutual_follow <br>
in_reply_to_tweet.recap.tweetfeature.from_verified_account <br>
in_reply_to_tweet.recap.tweetfeature.has_hashtag <br>
in_reply_to_tweet.recap.tweetfeature.has_image <br>
in_reply_to_tweet.recap.tweetfeature.has_mention <br>
in_reply_to_tweet.recap.tweetfeature.has_news <br>
in_reply_to_tweet.recap.tweetfeature.has_video <br>
in_reply_to_tweet.recap.tweetfeature.has_visible_link <br>
in_reply_to_tweet.recap.tweetfeature.is_author_bot <br>
in_reply_to_tweet.recap.tweetfeature.is_author_new <br>
in_reply_to_tweet.recap.tweetfeature.is_author_nsfw <br>
in_reply_to_tweet.recap.tweetfeature.is_author_spam <br>
in_reply_to_tweet.recap.tweetfeature.is_offensive <br>
in_reply_to_tweet.recap.tweetfeature.is_reply <br>
in_reply_to_tweet.recap.tweetfeature.is_sensitive <br>
in_reply_to_tweet.recap.tweetfeature.num_mentions <br>
in_reply_to_tweet.recap.tweetfeature.prev_user_tweet_enagagement <br>
in_reply_to_tweet.recap.tweetfeature.unidirectiona_fav_count <br>
in_reply_to_tweet.recap.tweetfeature.unidirectional_reply_count <br>
in_reply_to_tweet.recap.tweetfeature.unidirectional_retweet_count <br>
in_reply_to_tweet.recap.tweetfeature.user_rep <br>
in_reply_to_tweet.timelines.earlybird.decayed_favorite_count <br>
in_reply_to_tweet.timelines.earlybird.decayed_quote_count <br>
in_reply_to_tweet.timelines.earlybird.decayed_reply_count <br>
in_reply_to_tweet.timelines.earlybird.decayed_retweet_count <br>
in_reply_to_tweet.timelines.earlybird.has_quote <br>
in_reply_to_tweet.timelines.earlybird.quote_count <br>
in_reply_to_tweet.timelines.earlybird.weighted_fav_count <br>
in_reply_to_tweet.timelines.earlybird.weighted_quote_count <br>
in_reply_to_tweet.timelines.earlybird.weighted_reply_count <br>
in_reply_to_tweet.timelines.earlybird.weighted_retweet_count <br>
in_reply_to_tweet.timelines.earlybird_score <br>
in_reply_to_tweet.tweetsource.tweet.media.aspect_ratio_den <br>
in_reply_to_tweet.tweetsource.tweet.media.aspect_ratio_num <br>
in_reply_to_tweet.tweetsource.tweet.media.height_1 <br>
in_reply_to_tweet.tweetsource.tweet.media.height_2 <br>
in_reply_to_tweet.tweetsource.tweet.media.video_duration <br>
in_reply_to_tweet.tweetsource.tweet.text.has_question <br>
in_reply_to_tweet.tweetsource.tweet.text.length <br>
in_reply_to_tweet.tweetsource.tweet.text.num_caps <br>
</code>
</td>
</tr>
</table>
</code>
</details>
<details>
<summary><b><code>timelines.earlybird</code></b></summary>
<br>
This feature group passes on features used by the search and light ranking service ("Earlybird") to the Heavy Ranker. <br>
<table>
<tr>
<td>
<code>
timelines.earlybird.decayed_favorite_count <br>
timelines.earlybird.decayed_quote_count <br>
timelines.earlybird.decayed_reply_count <br>
timelines.earlybird.decayed_retweet_count <br>
timelines.earlybird.embeds_impression_count_v2 <br>
timelines.earlybird.embeds_url_count_v2 <br>
timelines.earlybird.fake_favorite_count <br>
timelines.earlybird.fake_quote_count <br>
timelines.earlybird.fake_reply_count <br>
timelines.earlybird.fake_retweet_count <br>
timelines.earlybird.has_quote <br>
timelines.earlybird.is_composer_source_camera <br>
timelines.earlybird.label_abusive_flag <br>
timelines.earlybird.label_abusive_hi_rcl_flag <br>
timelines.earlybird.label_dup_content_flag <br>
timelines.earlybird.label_nsfw_hi_prc_flag <br>
timelines.earlybird.label_nsfw_hi_rcl_flag <br>
timelines.earlybird.label_spam_flag <br>
timelines.earlybird.label_spam_hi_rcl_flag <br>
timelines.earlybird.periscope_exists <br>
timelines.earlybird.periscope_has_been_featured <br>
timelines.earlybird.periscope_is_currently_featured <br>
timelines.earlybird.periscope_is_from_quality_source <br>
timelines.earlybird.periscope_is_live <br>
timelines.earlybird.preported_tweet_score <br>
timelines.earlybird.quote_count <br>
timelines.earlybird.visible_token_ratio <br>
timelines.earlybird.weighted_fav_count <br>
timelines.earlybird.weighted_quote_count <br>
timelines.earlybird.weighted_reply_count <br>
timelines.earlybird.weighted_retweet_count <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>realtime_interaction_graph</code></b></summary>
<br>
User-author interaction features. Similar to RealGraph but updated more rapidly. <br>
<table>
<tr>
<td>
<code>
realtime_interaction_graph.click.count <br>
realtime_interaction_graph.click.days_since_last <br>
realtime_interaction_graph.fav.count <br>
realtime_interaction_graph.fav.days_since_last <br>
realtime_interaction_graph.mention.count <br>
realtime_interaction_graph.mention.days_since_last <br>
realtime_interaction_graph.profile_view.count <br>
realtime_interaction_graph.profile_view.days_since_last <br>
realtime_interaction_graph.retweet.count <br>
realtime_interaction_graph.retweet.days_since_last <br>
realtime_interaction_graph.soft_follow.count <br>
realtime_interaction_graph.soft_follow.days_since_last
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>user_tweet.recommendations</code></b></summary>
<br>
Similarity of a tweet to a user's recent engaged tweets. <br>
<table>
<tr>
<td>
<code>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_1d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_1d_last_10_max <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_7d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.fav_7d_last_10_max <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_30d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_30d_last_10_max <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_7d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.follow_7d_last_10_max <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_1d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_1d_last_10_max <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_7d_last_10_avg <br>
user_tweet.recommendations.sim_clusters_recent_engagement_similarity.retweet_7d_last_10_max <br>
user-tweet.recommendations.sim_clusters_scores.user_interested_in_tweet_embedding_dot_product_20m_145k_2020 <br>
</code>
</td>
</tr>
</table>
</details>
<details>
<summary><b><code>other</code></b></summary>
<br>
Here we list individual features not covered in any feature group <br>
<table>
<tr>
<td>
<code>
author_health.num_connect <br>
author_health.num_connect_days <br>
author_health.num_followers <br>
engagement_features.in_network.favorites.count <br>
engagement_features.in_network.replies.count <br>
engagement_features.in_network.retweets.count <br>
request_context.display_dpi <br>
request_context.display_height <br>
request_context.display_width <br>
request_context.is_get_initial <br>
request_context.is_get_middle <br>
request_context.is_get_newer <br>
request_context.is_get_older <br>
request_context.is_session_start <br>
time_features.earlybird.last_favorite_since_creation_hrs <br>
time_features.earlybird.last_quote_since_creation_hrs <br>
time_features.earlybird.last_reply_since_creation_hrs <br>
time_features.earlybird.last_retweet_since_creation_hrs <br>
time_features.earlybird.time_since_last_favorite <br>
time_features.earlybird.time_since_last_quote <br>
time_features.earlybird.time_since_last_reply <br>
time_features.earlybird.time_since_last_retweet <br>
time_features.is_tweet_recycled <br>
time_features.non_polling_requests_since_tweet_creation <br>
time_features.time_between_non_polling_requests_avg <br>
time_features.time_since_last_non_polling_request <br>
time_features.time_since_source_tweet_creation <br>
time_features.time_since_tweet_creation <br>
time_features.time_since_viewer_account_creation_secs <br>
time_features.tweet_age_ratio <br>
</code>
</td>
</tr>
</table>
</details>
## Embeddings Features
[Twhin](https://arxiv.org/pdf/2202.05387.pdf) is a large graph embedding trained on Twitter data. We use three 200-dimensional embeddings sourced from the Twhin algorithm.
<details>
<summary><b><code>Twhin Follow Embeddings</code></b></summary>
<br>
We have two embeddings trained on the user-user follow graph, one representing who is likely to follow a user and the other representing who a user is likely to follow. Each embedding is 200-dimensional.
</details>
<details>
<summary><b><code>Twhin Engagement Embeddings</code></b></summary>
<br>
We have one embedding trained on the user-tweet engagement graph, representing users based on the Tweets they are likely to engage with. This embedding is 200 dimensional.