-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expect-no-linked-resources Document-Policy to Speculative parsing #10718
base: main
Are you sure you want to change the base?
Conversation
…#1) * Add expect-no-linked-resources Document-Policy to Speculative HTML parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an extra commit with small fixes I noticed during a final review.
Otherwise, this editorially LGTM.
I know this was discussed at the WebPerf WG and there was some general support from multiple implementers. And you're working on standards positions now. But if any implementers want to comment here, that'd be very welcome! I'll tag this as agenda+ to get some attention from the WHATNOT meeting crowd.
It's also noteworthy that this is the first document policy feature in HTML, and Document Policy itself is not yet integrated into HTML: https://wicg.github.io/document-policy/#integration-with-html . If this feature gets multi-implementer interest, than we should work on doing that integration sooner rather than later. /cc @clelland
Isn't this basically hacking around some implementation limitations in blink (and maybe in webkit)? Gecko doesn't in general need to do separate speculation passes. |
If the speculative parsing step isn't conducted as a separate step, there might trivial to no benefit for Gecko I'd imagine. If the documented implementation behavior still holds true, there might be a non-trivial benefit to Gecko for the speculation failure scenario (but only in cases where the web dev is able to hint to Gecko). The HTML spec doesn't demand whether speculation should be conducted as a separate step, or inline with document parsing. The language of the spec seems written in a way that the implementation of parser vs. speculative scanner are independent — eg. Bytes pushed into the HTML parser's input byte stream must also be pushed into the speculative HTML parser's input byte stream — thereby making the separate cost incurred for speculation a possibility. I'd imagine that doing it inline with parsing might still be a trade off given that parser is specified to stop on encountering scripts. Or may be there's a way to continue tokenizing with the risk of discarding if DOM was indeed modified. Perhaps thats what Gecko does today. I did a rough benchmark of medium to complex page sets with a fresh Chromium release build, and it seems to spend between ~70-100 ms in scanning the HTML spec, which I'd think is an extreme example given the nature of that page. The Web Bluetooth spec seems to take between 15-20 ms. For something rather simple, like CC 3.0 license, it seems to spend about 5 ms on average. These were measured on capable box, equivalent of an M1 Max Macbook, so I'd guess the gains might be much bigger for slower CPUs and hardware. My understanding today is that there's a non-trivial performance advantage to be had, depending on hardware, for pages that don't benefit from speculating resource URLs to fetch. The Origin Trial I ran in Chrome concurs with the same. The directive So the open questions in my mind at the moment are:
|
I think generally if the consensus is that engines could do more work to make it faster, we don't ask web developers to put in the work to make it faster. See priority of constituencies. |
It's not clear to me that this is the case. My understanding is that we have two different types of tokenizer + tree builder + parser + speculative parser architectures:
The WebKit/Blink architecture benefits from the expect-no-linked-resources hint, whereas the Gecko architecture does not. But, we don't have any evidence that in general the Gecko architecture is superior to the WebKit/Blink one. Stated another way, we have the following four scenarios:
We know that GN = GH, and WH > WN. But we don't have any information on the relationship between G and WN, or G and WH. If G >= WN and G >= WH for all possible websites, then I agree that this feature is not very aligned with the priority of constituencies, and WebKit/Blink should move to the Gecko architecture since it is always faster. But I suspect there are cases where WN > G, and especially that there are cases where WH > G. In that case, this feature adds value to the web, by allowing the combined forces of web developers (via the hint) and browser implementations (via the in-this-scenario-faster WebKit/Blink architecture) to speed up page loads beyond what's possible with just the Gecko architecture. |
I do agree with the first statement. However, I do not think this is against the priority of constituencies. This hint is very similar to link[preload] which is also an indicative signal like this in nature, from the page (or web developer), and it's all still in the best interest of users. Without the hint, on pages that would benefit from it, the UA would spend compute and resources wastefully, which is not in favor of the user or user experience. Much like the preload hint, the UA has no conclusive way to derive the same signal on its own. |
It doesn't really do more work, or that work is rather minimal. You know anyhow while parsing that you have links to other resources so it is cheap to reuse that information for speculative loads.
Well, it would be opposite of that. The flag as defined in the pr would be bad for performance in case the page then does want to use explicit preloads. Speculative parsing could have started those loads way before the explicit preload would happen. (That is at least in Gecko which doesn't need the separate pass). |
User Agents have implemented speculative parsing of HTML to speculatively fetch resources that are present in the HTML markup, to speed up page loading. For the vast majority of pages on the Web that have resources declared in the HTML markup, the optimization is beneficial and the cost paid in determining such resources is a sound tradeoff. However, the following scenarios might result in a sub-optimal performance tradeoff vs. the explicit time spent parsing HTML for determining sub resources to fetch:
This proposal introduces a configuration point in Document Policy by the name
expect-no-linked-resources
to hint to a User Agent that it may choose to optimize out the time spent in such sub resource determination.Read the complete Explainer and spec changes proposed that covers the changes in this PR.
/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/dom.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/structured-data.html ( diff )
/urls-and-fetching.html ( diff )