Representation Scorer (RSX) serves as a centralized scoring system, offering SimClusters or other embedding-based scoring solutions as machine learning features.
Representation Manager (RMS) serves as a centralized embedding management system, providing SimClusters or other embeddings as facade of the underlying storage or services.
Open sourcing Aggregation Framework, a config-driven Summingbird based framework for generating real-time and batch aggregate features to be consumed by ML models.
Since the first batch of open sourcing, we have added the following components:
- User signal service
- Unified user actions
- Topic social proof service
Update the README to include these.
Unified User Action (UUA) is a centralized, real-time stream of user actions on Twitter, consumed by various product, ML, and marketing teams. UUA makes sure all internal teams consume the uniformed user actions data in an accurate and fast way.
User Signal Service (USS) is a centralized online platform that supplies comprehensive data on user actions and behaviors on Twitter. This service stores information on both explicit signals, such as Favorites, Retweets, and replies, and implicit signals like Tweet clicks, profile visits, and more.
Topic Social Proof Service (TSPS) delivers highly relevant topics tailored to a user's interests by analyzing topic preferences, such as following or unfollowing, and employing semantic annotations and other machine learning models.
Remove unused ranking params which are specified by services when making an Earlybird relevance search.
For cr-mixer: since we always set useTensorflowRanking = true in EarlybirdSimilarityEngineRouter, we will only ever use the TensorFlowBasedScoringFunction for ranking search results. That function doesn't rely on any of the linear params specified in getLinearRankingParams, nor the boosts because we set applyBoosts = false in the request. These parameters are therefore strictly redundant.
The parameters in home-mixer can be removed for essentially the same reason—the parameters are redundant given that we use the Tensorflow scoring function and don't apply boosts.
Please note we have force-pushed a new initial commit in order to remove some publicly-available Twitter user information. Note that this process may be required in the future.