Update README.md

This commit is contained in:
Faraz Razi 2023-04-01 05:31:11 +05:00 committed by GitHub
parent 8de33f89e9
commit 631f5ee21c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -28,13 +28,13 @@ The light ranker features pipeline is as follows:
Some of these components are explained below:
Index Ingester: This is an indexing pipeline that handles the tweets as they are generated. It is the main input of Earlybird, producing Tweet Data (the basic information about the tweet, such as the text, URLs, media entities, facets, etc.) and Static Features (features that can be computed directly from a tweet, such as whether it has a URL, cards, quotes, etc.). All information computed here is stored in the index and flushed as each real-time index segment becomes full. They are loaded back later from disk when Earlybird restarts. Note that the features may be computed in a non-trivial way (like deciding the value of hasUrl), as they could be computed and combined from some more "raw" information in the tweet and from other services.
- Index Ingester: This is an indexing pipeline that handles the tweets as they are generated. It is the main input of Earlybird, producing Tweet Data (the basic information about the tweet, such as the text, URLs, media entities, facets, etc.) and Static Features (features that can be computed directly from a tweet, such as whether it has a URL, cards, quotes, etc.). All information computed here is stored in the index and flushed as each real-time index segment becomes full. They are loaded back later from disk when Earlybird restarts. Note that the features may be computed in a non-trivial way (like deciding the value of hasUrl), as they could be computed and combined from some more "raw" information in the tweet and from other services.
Signal Ingester: This is the ingester for Realtime Features, which are per-tweet features that can change after the tweet has been indexed. They mostly include social engagements like retweetCount, favCount, replyCount, etc., along with some (future) spam signals that are computed with later activities. These features are collected and computed in a Heron topology by processing multiple event streams and can be extended to support more features.
- Signal Ingester: This is the ingester for Realtime Features, which are per-tweet features that can change after the tweet has been indexed. They mostly include social engagements like retweetCount, favCount, replyCount, etc., along with some (future) spam signals that are computed with later activities. These features are collected and computed in a Heron topology by processing multiple event streams and can be extended to support more features.
User Table Features: This is another set of features per user, which are from User Table Updater, a different input that processes a stream written by our user service. It is used to store sparse real-time user information. These per-user features are propagated to the tweet being scored by looking up the author of the tweet.
- User Table Features: This is another set of features per user, which are from User Table Updater, a different input that processes a stream written by our user service. It is used to store sparse real-time user information. These per-user features are propagated to the tweet being scored by looking up the author of the tweet.
Search Context Features: These are basically the information of the current searcher, such as their UI language, their own produced/consumed language, and the current time (implied). They are combined with Tweet Data to compute some of the features used in scoring.
- Search Context Features: These are basically the information of the current searcher, such as their UI language, their own produced/consumed language, and the current time (implied). They are combined with Tweet Data to compute some of the features used in scoring.
The scoring function in Earlybird uses both static and real-time features. Examples of static features used are:-