From 4c5db5391626af0c4206a2e973d295813478550a Mon Sep 17 00:00:00 2001 From: Noting565 Date: Fri, 31 Mar 2023 17:12:52 -0700 Subject: [PATCH 1/3] fix grammar in simclusters-ann readme.md --- simclusters-ann/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simclusters-ann/README.md b/simclusters-ann/README.md index 8770435cd..4341555d2 100644 --- a/simclusters-ann/README.md +++ b/simclusters-ann/README.md @@ -12,7 +12,7 @@ The cosine similarity between two Tweet SimClusters Embedding presents the relev SimClusters from the Linear Algebra Perspective discussed the difference between the dot-product and cosine similarity in SimCluster space. We believe the cosine similarity approach is better because it avoids the bias of tweet popularity. - However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure. + However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Considering that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure. ## SimClusters Approximate Cosine Similariy Core Algorithm From b04e3521a2d0106d1c0f791a530914fdd53a1065 Mon Sep 17 00:00:00 2001 From: Animus <54163084+animusDS@users.noreply.github.com> Date: Fri, 31 Mar 2023 17:21:59 -0700 Subject: [PATCH 2/3] fix another grammatical error this fixes another grammar error with assistance from @sdornan --- simclusters-ann/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simclusters-ann/README.md b/simclusters-ann/README.md index 4341555d2..b186e2c2a 100644 --- a/simclusters-ann/README.md +++ b/simclusters-ann/README.md @@ -12,7 +12,7 @@ The cosine similarity between two Tweet SimClusters Embedding presents the relev SimClusters from the Linear Algebra Perspective discussed the difference between the dot-product and cosine similarity in SimCluster space. We believe the cosine similarity approach is better because it avoids the bias of tweet popularity. - However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Considering that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure. + However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Considering that we need to process over 6,000 RPS, it’s hard to support with the existing infrastructure. ## SimClusters Approximate Cosine Similariy Core Algorithm From 7397e3d2b03a609453800f7b0244fb711cdb1a56 Mon Sep 17 00:00:00 2001 From: Noting565 Date: Fri, 31 Mar 2023 17:30:02 -0700 Subject: [PATCH 3/3] Update ScoreFacadeStore.scala The correct article to use before the word "uniform" is "an" in American English. --- .../com/twitter/simclusters_v2/score/ScoreFacadeStore.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/scala/com/twitter/simclusters_v2/score/ScoreFacadeStore.scala b/src/scala/com/twitter/simclusters_v2/score/ScoreFacadeStore.scala index ac084e737..6f0ccfd12 100644 --- a/src/scala/com/twitter/simclusters_v2/score/ScoreFacadeStore.scala +++ b/src/scala/com/twitter/simclusters_v2/score/ScoreFacadeStore.scala @@ -10,7 +10,7 @@ import com.twitter.storehaus.ReadableStore import com.twitter.util.Future /** - * Provide a uniform access layer for all kind of Score. + * Provide an uniform access layer for all kind of Score. * @param readableStores readable stores indexed by the ScoringAlgorithm they implement */ class ScoreFacadeStore private (