Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMS-sketch #667

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from
Open

AMS-sketch #667

wants to merge 17 commits into from

Conversation

cesarcolle
Copy link

What's it ?

related to #563
This is a monoid implementation for AMS sketch. Not all the optimization from Count Min Sketch algorithm are implemented, but it's a beginning and for some operation like innerProduct estimation, join, ..

What does it bring ?

I made some tests like join between two AMS / CMS :

[info] Running (fork) org.openjdk.jmh.Main -t1 -f1 -wi 4 -i 4 .*AMSJoin.*
[info] Benchmark                                (bucket)  (depth)  (size)   Mode  Cnt      Score      Error  Units
[info] AMSJoinBenchmark.amsJoinBenchmarkString        27       16    1000  thrpt    4  74803,226 ± 13458,935  ops/s
[info] AMSJoinBenchmark.amsJoinBenchmarkString       543       16    1000  thrpt    4   3355,353 ±  298,165  ops/s
[info] AMSJoinBenchmark.amsJoinBenchmarkString      5438       16    1000  thrpt    4    243,427 ±   92,126  ops/s

and for CMS :

[info] Running (fork) org.openjdk.jmh.Main -t1 -f1 -wi 4 -i 4 .*CMSJoin.*
[info] Benchmark                                  (delta)   (eps)  (size)   Mode  Cnt      Score      Error  Units
[info] CMSJoinBenchmark.amsJoinBenchmarkString  0.0000001     0.1    1000  thrpt    4  45450,447 ± 5838,600  ops/s
[info] CMSJoinBenchmark.amsJoinBenchmarkString  0.0000001   0.005    1000  thrpt    4   3502,317 ±   83,948  ops/s
[info] CMSJoinBenchmark.amsJoinBenchmarkString  0.0000001  0.0005    1000  thrpt    4    320,697 ±   23,353  ops/s

the delta / eps are translate to the proper buckets / depth for AMS / CMS.
I think with more optimisation we can have better score for AMS.

Better error estimation on inner-product

The estimation for innerProduct or F2 between AMS aren't properly implemented yet. I'll dig more in research paper to find probability, bounds, ...

BUT

I benchmark on something like :

class AMSSketchInnerJoinCMS extends WordSpec with Matchers {
  val numEntries = 10000

  def innerProduct(arr1: Vector[Int], arr2: Vector[Int]): Long =
    arr1.zip(arr2).map(p => p._1 * p._2) sum

  val amsMonoid = new AMSMonoid[Int](16, 543)
  val cmsMonoid = CMS.monoid[Int](16, 543, 1)

  " an AMSSketch " should {
    " have a better approximation of it inner product than CMS " in {
      val entries = (0 until 100).map(_ => Random.nextInt(100))
      val entriesCompare = (0 until 100).map(_ => Random.nextInt(100).abs)

      val trueInner = innerProduct(entries.toVector, entriesCompare.toVector)
      val ams = amsMonoid.create(entries)
      val cms = cmsMonoid.create(entries)

      val amsCompare = amsMonoid.create(entriesCompare)
      val cmsCompare = cmsMonoid.create(entriesCompare)

      val innerAMS = ams.innerProduct(amsCompare)
      val innerCMS = cms.innerProduct(cmsCompare)

      val errorAMS = trueInner - innerAMS.estimate
      val errorCMS = trueInner - innerCMS.estimate

      assert(errorAMS.abs <= errorCMS.abs)

    }

  }

}

I found this test always pass ans AMS create better estimation for the innerProduct.

How to use

  "An AMSSketchMonoid " should {
    "be used like an algebird monoid " in {
      val aMSMonoid = new AMSMonoid[String](100, 100)
      val sketch = aMSMonoid.create(Seq("aline", "aline", "aline"))
      assert(sketch.f2 ~ 9 )
    }
  }

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2019

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants