Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on Large Dataset #2

Open
buzzcola opened this issue Sep 26, 2016 · 0 comments
Open

Performance on Large Dataset #2

buzzcola opened this issue Sep 26, 2016 · 0 comments

Comments

@buzzcola
Copy link

Continued from this conversation: http://stackoverflow.com/questions/39669376/documentdb-stored-procedure-continuation

I implemented your suggestion to pass the Response object in as the new configuration for the stored procedure when there's a continuation. My code looks like this now:

            var roundtrips = 0;

            var timer = Stopwatch.StartNew();

            var configString = @"{
                    cubeConfig: {
                        groupBy: 'year',
                        field: 'Amount',
                        f: 'sum'
                    },
                    filterQuery: 'select * from TestLargeData t where t.Amount > 0'
                }";

            var config = JsonConvert.DeserializeObject<object>(configString);
            Console.WriteLine($"Query #{roundtrips+1}...");
            var result = await _client.ExecuteStoredProcedureAsync<dynamic>("dbs/foo/colls/bar/sprocs/baz", config);
            roundtrips++;

            while (result.Response["continuation"] != null)
            {
                // make a new config which is the entire response from the last call.
                var nextConfig = JsonConvert.DeserializeObject(result.Response.ToString());
                Console.WriteLine($"Query #{roundtrips + 1}...");
                result = await _client.ExecuteStoredProcedureAsync<dynamic>("dbs/foo/colls/bar/sprocs/baz", nextConfig);
                roundtrips++;
            }

            timer.Stop();

As of this writing my query is on round trip #123 and is taking about 11 seconds per trip.

As mentioned in the SO post, my collection has 1M records and a very simple structure:

{
    "year": 2007,
    "SomeOtherField1": "SomeOtherValue1",
    "SomeOtherField2": "SomeOtherValue2",
    "Amount": 12000,
    "id": "0ee80b66-7fa7-40c1-9124-292c01059562",
    "_rid": "...",
    "_self": "...",
    "_etag": "\"...\"",
    "_attachments": "attachments/",
    "_ts": ...
  }

The collection is set up for 1000 RU's. The indexing policy on the collection is as follows:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Hash",
          "dataType": "String",
          "precision": 3
        }
      ]
    }
  ],
  "excludedPaths": []
}

Is there anything obviously wrong with what I'm doing here?

Thanks very much for your help, I appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant