Today I think I'll describe WebKit's most central task: to render the text it's parsed from a webpage.

There's a fair bit to this, most but not all of which I'd consider necessary in order to reach an international audience. I'll start from the DocumentLoader which manages the webpage's download as described in a previous toot thread, and try to skip over other aspects this code touches.

DocumentLoader streams it's responses into via DocumentWriter to a DocumentParser. This parser is created by the DOMImplementation per the request of DocumentWriter, the latter of which also handles clearing the page amongst other things upon starting the stream.

Specifically it constructs a HTMLDocumentParser, which hands the text off to a tokenizer to split the document by HTML tags with attributes attached. This tokenizer is implemented as a state machine using C macros and a switch branch.

Show thread

WebKit then sends those "tags" and raw text to the "HTMLTreeBuilder". This structures the tree differently based on it's current state and the tag being inserted into the document tree.

To do the actual work it hands the tag on to the HTMLConstructionSite, which in turn wraps HTMLElementFactory (a Perl-compiled C++ hashmap) and the HTMLElement instances it creates.

NOTE: I'm not particularly fond of the parts of the HTML standard which calls for these components, they're overly complex.

Show thread

@alcinnz I'd like the next version of html to throw out all the little nuances and corrections it does to make malformed markup work , and instead have browsers tell the markup author exactly what is wrong with their markup and how to fix it.

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!