The HTTP Archive tracks how the web is built. For well over a decade, it has collected detailed data on the resources making their way on to the web, platform APIs being used and how. In 2019, the HTTP Archive gathered much of this information, and created what would be The Web Almanac: a comprehensive annual report on the state of the web, backed by real data. In 2021, we celebrate their 3rd edition by inviting Staff DevlRel Engineer and lead of the project, Rick Viscomi and a few of the authors of the 24 chapter tome, to share how it came about, interesting data and some particulars of working on such a comprehensive project. Enjoy!
⚡️ Sign up for a FREE WebPageTest account: https://www.webpagetest.org and start profiling.
The HTTP Archive tracks how the web is built. For well over a decade, it has collected detailed data on the resources making their way on to the web, platform APIs being used and how. In 2019, the HTTP Archive gathered much of this information, and created what would be The Web Almanac: a comprehensive annual report on the state of the web, backed by real data. In 2021, we celebrate their 3rd edition by inviting Staff DevlRel Engineer and lead of the project, Rick Viscomi and a few of the authors of the 24 chapter tome, to share how it came about, interesting data and some particulars of working on such a comprehensive project. Enjoy!
⚡️ Sign up for a FREE WebPageTest account: https://www.webpagetest.org and start profiling.
0:08
Henri: Welcome- welcome to our AMA. This is going to be one of many more to come. But this is something actually that we've been trying to do for quite a while. I'll get into the details. But once again, welcome to today's AMA. My name is Henri, I'm at WebpageTest, better known as [inaudible] vedika. I'm going to go through the four guests that we have today. And let me give you a quick overview of what's going on today. So, the Web Almanac, a fantastic tome that comes out once a year, was released in December. And we actually have been talking about doing something like this for quite a while, something like where we can deep dive into some of the details and speak to some of the authors and the principles around it. And three years into this fantastic project, we're able to sort of like come together and get this done. So, we are just glad that you could join us. So, as I said, this has been long in the mHenring, so I'm going to get right into it and not waste any time. With me today we have our four esteemed guests. And I'm going to go I guess in- would that be clockwise? Yeah, it would be. I'm going to start with Rick. Rick, did you want to introduce yourself?
1:25
Rick: Sure. Hi, everybody. My name is Rick Vescomi. I work at Google as a developer relations engineer on the Web Five builds team. And I spend a lot of my time working on web transparency projects. Or you can tell my speaker just heard me. Hey, Google stop.
1:42
Henri: That’s funny.
1:44
Rick: So that's web transparency is basically understanding the state of the web. And I do that through projects like HTTP Archive, and the Crummy Export.
1:54
Henri: Awesome. Well, thank you for mHenring some time and glad you could join us. Next up, I believe at 12 o'clock is Nishu.
2:02
Nishu: Hey everybody. Yeah, my name is Nishu Goya. I work as a full stack engineer at a company called Web Data Works. Most of my work revolves around improving the web with the different frameworks that we work with and improving the performance in general. And I think that's what brings me here.
2:20
Henri: Fantastic. And you know, I can we get an extra round of applause for Nishu because I think it's like midnight her time. So, we definitely do appreciate her staying up and joining us for this amazing, AMA. Next up we have at one o'clock, I believe is Sia.
2:39
Sia: Hello, everyone. My name is Sia Karma Legos. I am in North Carolina in the US. And I am a freelance web developer and performance engineer. I- I also like to build a lot of things in public. It is hard to just focus on just one thing, but I really do love performance and all things web.
2:59
Henri: Awesome. I do you like to build- build in public, especially the little Legos, is that correct?
3:05
Sia: To build Legos in public too.
3:07
Henri: Okay, I just want to double check. And lastly, but not least at round seven o'clock we have Barry. Barry, stage is yours.
3:15
Barry: Hey, I'm Barry Pollard. I’m a software develop, I run a team for an Irish healthcare company here in Cork. And because I don't get enough at work, I get involved in a lot of open-source projects in my spare time, fascinated by the web and what's involved. And yeah, I got involved in the Web Almanac fairly early on. And we help Rick kind of wrangle the cats and got it all together.
3:42
Henri: Awesome. Well, Barry, thank you for joining us. I know it's the evening your time as well. So we do appreciate it. Your family man. So I guess everyone's got just KD right now. Is that it?
3:56
Barry: The door is locked back, [inaudible] been set on we'll see if we can survive without them barging in, like they usually do.
4:00
Henri: Awesome. I mean, I remember seeing that BBC interview when the kids came in. So, you know, hopefully, that won't happen. Thank you, everyone. Thank you for coming by. And we are going to get into everything. Well, Almanac, you know, some of the details on how this whole project was created? In fact, you know, we might as well start right now. You know, Rick, I want to ask you, essentially, the Web Almanac is born out of the HTTP Archive. And for those who don't know, did you want to sort of like- sort of touch on what the HTTP Archive is all about?
4:34
Rick: Sure. It started in 2010 by Steve Souders, to understand how the web is built, and that's actually still our mission today is tracking how the web is built. And the way it works is that we track. It started off with just 10s of 1000s of websites and it's grown to millions. And we test the sites on a regular basis. It also started bi-weekly now it's monthly. And we try to understand everything there is to know about a webpage. It's not just, you know, how many bytes did it load? But how was it built in terms of the different technologies that were used, CD ends and everything about a page. And what makes it special is that it's entirely publicly query double. And we report on the insights on the HTTParchive.org. website. And more recently, through the Web Almanac as well.
5:25
Henri: Awesome. Awesome. And, you know, as a perfect segue, I mean, we might as well jump into the, the Web Almanac and, and how was it essentially even sort of conceived, like what went through your mind or the team's mind that something like the Web Almanac should come about?
5:42
Rick: The impetus was definitely trying to facilitate access to the data. There are hundreds of terabytes of data. I think it's over a petabyte now, in the HTTP Archive data set, and it was limited by the questions that people knew to ask of the data, and also the technical skill to be able to write a query to get the answer to your question. It's not just enough to have the question, but skills to answer. And so the problem that I was seeing in the community, the HTTP Archive community was, there would be maybe one-off questions here or there. There's a discussion forum at discuss.HTTParchive.org. And people will ask questions and periodically follow up. One of them is like, what are the top rum providers? And once in a while somebody will ping the thread and be like, “Oh, yeah, here's how things changed recently”. So I kind of got the idea actually inspired by [inaudible] performance advent calendar, like what if there was a regularly occurring publication that was written by the community where we can tap into the expertise of everybody from different aspects of their expertise in throughout the community, and they can write a chapter that has to do with a specific part of web development, the state of the web, and kind of crowdsource the knowledge behind the data that is in the HTTP Archive, acts as the “Kernel of Truth”, around which the authors are kind of telling a narrative about the state of the web, and it's written on an annual basis, and we can track how things are changing year to year.
7:22
Henri: Now. Awesome, awesome. Now, you mentioned something very important there, which is the community and- and, you know, sort of tapping into expertise and whatnot. I mean, I'll ask Barry this, like, how do you go about choosing some of the authors or, you know, some of the, you know, various roles that are needed to get this Web Almanac out?
7:42
Barry: Yeah, so, we do want to- we don't want to choose the authors. We want the community to help us choose them. So, we normally put our various posts on social media, we started getting things up. What we did last few times, as we started a- an issue for each chapter and said, “Hey, if you're interested in JavaScript, you know, put your name here, volunteer, if you'd like to see someone that you think would write well in JavaScript, tag them in it, see if we can get them involved. And then, you know, using GitHub, the likes, and plus ones and thumbs ups, and so on, can we try and choose that.” I think some of the chapters that works quite well, some of the popular ones, JavaScript, Web Performance, and that sort of thing. And there are other chapters that are more difficult to source, some of them have the same authors a couple years running. So, like part of what I'd like to get out of this is an open invitation for other people to get involved.
And I think for authors, we do like, obviously, to be a bit of an expert in that area and have some sort of expertise in it. There's lots of other roles. However, if you're more junior, or starting out in the web area, from analysts to help write queries, and we'll get a lot of help with that. We've got a website to write as well, that I look after the dev team there. So, there is huge amount of opportunities. They're reviewing and editing. So, if, you know, if you haven't got expertise to actually authored a Chapter, you can get involved there and make comments or you don't technical problems, or even, you know, chapter paragraphs are difficult to understand or [inaudible] or whatever. So, yeah, I think the authored ones are the ones that kind of get the name. And we do take a bit of care of it there. And we're trying to look for someone who can demonstrate their expertise. And we think we'll be able to commit to this because there's- it is quite a bit of effort and time to be involved. I think an Nishu and Sia will attest to that. And then the rest, there is an awful lot of opportunities separate outside of the authors as well to get involved.
9:41
Henri: awesome, awesome. Now, you know, you mentioned JavaScript, so I'm going to ring that bell. Nishu and let us know how. First of all, you chose the JavaScript chapter which is, you know, I think very interesting because, you know, if you Just throw any kind of comment out on Twitter about JavaScript, it's kind of like, people go a little crazy, there's little battles online. But, you know, this is the chapter that you wrote. And let us know what it was like- what it was like writing that chapter, but specifically, why you chose that chapter.
10:19
Nishu: There's a very interesting story behind this, I was working on the performance of my application, trying to improve the, you know, the numbers there. And I was digging deeper into different resources, trying to improve that, which is where I came across one of the chapters of the almanac, I think it was 2020 JavaScript chapter. And then I dig deeper. And I went into the issues, I mean, first into the repo, the GitHub repo, and then into the issues. And that's where I found, I was only looking- looking for the JavaScript keyword. So, it was not the other chapters that you know, I could come across, it was the- the JavaScript chapter that came in, and then the issue there, and that's where they were looking for authors, right? That's where I commented that, oh, I would love to author the chapter. And turned out, I ended up becoming the content lead for that, right, which held lot of responsibilities, not just writing or preparing an outline of the chapter or not just, you know, preparing the content, off and on, it was like preparing the whole thing from scratch, and then writing the queries or, you know, getting help with the queries about it. So, I think that's- that's how it started.
11:27
Henri: Oh, that's amazing. So basically, it was, it was a kind of like a happy accident.
11:32
Nishu: Oh, yeah. I love the accident. Definitely.
11:36
Henri: Awesome. Awesome. And, you know, you'd mentioned that you're trying to look at some performance issues that led you to the almanac. SpeHenring of performance, Sia, ye of the performance chapter. First of all, how did you discover the almanac? And then why did you choose? Well, we kind of know why you chose performance, because it's the best thing. But let us know how your story came about in authoring the performance chapter for the almanac of 2021.
12:09
Sia: Sure, I don't remember how I first came to the almanac, because it's been a few years now. And I also like I talked to Rick a long time ago, because I was like, “I'm going to get into BigQuery. And, you know, find answers to questions.” I like never did it. So, I was like, “I should just write a chapter, in the Web Almanac and then I'll kind of like have to learn it.” I mean, you don't have to as an author; you don't have to be an analyst. But I'm also, I used to do more back in the front end. And so, I do already know, at least some SQL, not necessarily an expert-expert. But I am- sometimes the data one. Like, I want to understand what's going on. And so that idea appealed to me to be able to go into that data and find answers to questions, which, of course, leads to more questions, or queries. And so that's how I got into it. And then I finally committed to it last year, and yeah, like anyone, like, when actually, I think Rick already have a- do you already have issues set up? Or you haven't asked yet for others for next year?
13:14
Rick: Not yet.
13:16
Sia: But you can go- Yeah, when he does, you just like say, “Yeah, I want to participate in this particular chapter. And that's how you get started.” And I think they already mentioned there's different roles, you can be an author, which does have a lot of responsibility. But you can also like get a lighter- a lighter load by being an editor, or a reviewer, or an analyst, if you like, already know, a lot of SQL, it would be really helpful. [crosstalk] And all of those roles are very- very important. Yes, designers and developers too sorry, my dog is being weird.
13:48
Henri: Awesome, awesome. You know, Barry, Nishu, Sia, you know, what was- you know, beyond the opportunity? Like, did you have- Did you have some kind of like, you know, feeling of needing to sort of contribute back to the community, when you- you know, joined the team and became like, an author, and, I mean- I mean, part of the almanac team this year?
14:18
Barry: Yeah, I like, there's lots of things I like about the project. One was, yeah, contributing back then that. Two was being nosy and having an itch and digging into the data. I think, like, Sia, I hadn't used BigQuery at all before, whenever I joined. I didn't know SQL from work, and that sort of thing. But it was fascinating just to dig in and be able to see what else you do. I like the community aspect of it and speHenring to people, you know, some of the people that I've worked on over the past few years are people whose books I've got on my bookshelf and to be able to meet those people and chat to those people was- was brilliant. And then yeah, just finding more information, helping set up the website and- and managing the content and- and getting it out there. It was just, there's many- many aspects of this, like some people are real data mining nerds, being an analyst is great, you can dig in there. And you do that. If you're more of a, like- I like to write and do that, and those authors, if you wouldn't be just be a developer and be involved in a project that's going to be used and read worldwide and be, you know, announced on Google dev conference and get your name up there, then, you know, there's- there's just so many aspects to the project that I find fascinating. And its combination of all that which just makes it really interesting to work on.
15:39
Henri: Awesome. And yourself. Nishu.
15:42
Nishu: Yeah, for, for myself, I think it's mostly like I came here as a developer, right to the almanac resource, I came to it as a developer. And what I found interesting was that it kind of intrigues, that sense of, you know, using things better using things efficiently when you read, for example- now, when you read the JavaScript chapter, you would feel that, “Oh, so many people are using this feature, this particular way, which is, you know, improving the performance in general or improving the usage in general”. And that to anybody, as a developer who's reading it would, you know, improve the overall situation. They would want to go back to their application, improve that stat for themselves. And so, I think that's- that's where the almanac helps the developer community a lot. And that's kind of giving back to, you know, the developers, for me as a developer. But now from the shoes of an author, I think it really helps the people on the other side,
16:38
Henri: Awesome. And yourself Sia.
16:41
Sia: For sure, I- that was kind of like a default anyways, for me, like I- I write a lot of content, and I try speHenring at conferences and things like that. So clearly, it's a desire I have to help the community. It's often like, when I come across a problem- I- I want to make sure other people don't necessarily have the same problem. So, then I- you know, either write or do something or share a little bit of saying to you, you, cohost or host several meetups. That's another part of that. It's just yeah, bringing community to our community.
17:16
Henri: Awesome-Awesome.
17:18
Barry: The other thing is the scope of it really does try to cover all aspects of the web. So, like, I started with HTTP Two chapter, because I've written a book on it. And then I'm in forms like Sia. So, I knew those two areas quite well. But other areas, like the capabilities chapters, or JAM Stack, I don't know as well. But you know, kind of nice segue into those. And we reading those and understanding those, and we try and name the chapters. And you know not to be you have to be an expert to read it, but also not that you're going to get something decent out of it. It's not just another blog post of here's an intro into this thing, or here is the stats that are an interesting history. It's supposed to be a bit more background info for it.
17:58
Henri: SpeHenring of chapters, and you know, right after this little mention, I might get into some of the questions that I see accumulating in the Slidell. SpeHenring of chapters, I was gonna say, you know, for those who don't know, the original Web Almanac, went to print, I guess, in production, whatever, in 2019. And then it was, I believe, 425 pages 2020 was 602 pages. And this year was a whopping 775. Now, Rick, I'm gonna ask you essentially, like, what is kind of changed between 2019 and 2021, that, you know, would sort of require so much more data and writing to accomplish?
18:48
Rick: The biggest change was from 2019 to 2020, for the CSS chapter, and that was when- the Barry effect. Yeah, once Barry starts writing, he's a prolific writer. So, in- what happened to the CSS chapter in 2020 was we unlocked some new capabilities with the type of depth of questions that we can answer. What we did was parse actual stylesheets and tokenize it and be able to query the structure of this stylesheets in a way that we can answer like; What is the most popular media query? What is the most popular property and value and group those things together? And that was amazing in the depth of insights that we could unlock from the CSS chapter in it. That alone, I believe, was 75 pages last year, and that we owe Leah who was the author who came up with over 100 questions. And I wrote all the queries to answer those questions. That was a lot of work. And Leah also wrote a significant portion of the JavaScript to be able to work with all that data that came out of the parser. So that was just one example of how we- how we had to jump in content like that. But at the same time, we were also expanding the scope of chapters that were included. Barry correct me if I'm wrong, I think 2019 was 20, chapters 2021 was 22, chapters. 2020 was 22 chapters. And in 2021, it was 24 chapters. So we're expanding the breadth of things that we can cover, and also expanding the depth in terms of the types of content I would love, for example, to be able to do the same thing for JavaScript that we did for CSS and say, like, what are the most popular variable names are, you know? Yeah, answer silly questions like that.
20:46
Sia: Also, ranking was new this year. So that was like a whole new set of stuff we could actually report on.
20:54
Rick: Now, do you want to get into like, what the ranking data looked like? And what enabled for you?
21:00
Sia: Oh, yeah, sure. Um, I thought it was. Actually, it's a cross between, I think my chapter and the CMS chapter was the most interesting thing, what you actually discovered, Rick, because you look at so many of the- so much of the data, while we're all busy writing away. But um, I think some of that ranking data did like have some interesting insights. So, if you don't already know, we- we now have, it's from crux. Right. And the current state No. And what both of them I forgot. Actually, I use mostly Crux data for the performance chapters. So, I kind of forget what's in separate in the webpage test data versus the Crux data or HTTP-
21:38
Rick: Ranking does come from Crux.
21:41
Sia: Okay. Yeah. So, we can now associate some information by ranking. So, for the top 1000 sites versus the top 10,000, and whatnot. And so it was kind of interesting to see that data and how, for example, performance metrics vary. But then also looking at some of those things, at least for me, it was really interesting to see how in the CMS data, how that vary by Ranking, and then also which vendors because like, Web page- Web page? WordPress is such a huge share of the CMS market, that some of those things that they implement, or don't implement, can actually I think, have a really big impact on performance, because they are such a large part of what's not necessarily like the top 1000 sites, but like the tail end. They have a much larger share. So, it was really interesting to see that. And I think all of us that work on platforms, should have a responsibility to try to make those platforms as performant as possible, so that the developers have to think less about performance, and all the nitty gritty. I mean, like they still should, but, man, it's so much easier when it does the job for you partially, at least.
22:57
Henri: Absolutely. You know, speHenring performance and larger platforms. Some of you may know that the WordPress actually has a sort of built a sort of performance team at core. And actually, I realized recently that Drupal is starting to do a little bit of work there as well, poking around, I'm gonna get- oh, by the way, and I think Rick had mentioned [inaudible]. For those who may or may not know, prolific CSS, actually, she's on the CSS working group, if I'm not mistaken, and an incredible speaker and live coder extraordinaire, by the way. But I want to get into-
23:42
Barry: One more thing is we are building upon each year. So, we wrote a lot of queries in 2019. And I remember Rick and I and the guys have organized it said, “Well, that's done, these 2020 will be easy. And we just had to update them from 2019- 20". Right. And we thought there’d be no new- no new queries. And then new authors come along, and new analysts with new texts and things. And they basically rewrote most, we get those for free. And then people were adding a little more and then in 2021, we thought, well, at least now, we're definitely done, same thing happened. We got an awful lot more queries there. And it's great because it's like people are like, okay, that's something I thought I could have been covered last time or thought about since and, and so on. So that's another reason for the growth is- is just people have different insights or think about things to take into differently.
24:29
Henri: No, that's fantastic. You know, sort of diversity- diversity of thought, and query. Let's get into a question from Ye of- Mr. Perdue. Ye of Cloudflare. Ye of Cloudflare TV, which I watch all the time. Ye of HTTP two and three, if I'm not mistaken. Mr. Perdue, please correct me if I'm wrong. But his question was, I believe, I guess for everyone, but particularly for everyone here. “If you had a genie that fulfill three wishes for the almanac work next year, what would you ask for?”
25:16
Sia: I think Rick should take that one.
25:18
Henri: The default to Rick.
25:20
Sia: And Barry. Rick and Barry.
25:23
Rick: I do have a wish list, but some of them are more practical than others. But I guess that's what a genius for?
25:27
Henri: Yeah.
25:29
Rick: The thing I think, I mean, if I could snap my fingers, I think we need to overhaul the analysis process. I think that is what is most burdensome to contributors in a couple of ways. So, the first way is that there is so much data, it's a blessing and a curse in the sense that we have the ability to mine a lot of information out of the dataset. But it's that much more cumbersome to work with. Querying a terabyte of data takes a certain amount of time. And BigQuery itself is a like, free service up to a point and beyond that, you have to start paying for it. And we want to make sure that nobody ever has to pay anything out of pocket in order to contribute to this project. So, HTTP Archive, the organization will absorb any of those costs. But it's just getting harder and harder to get data out in a way that's like, fast and efficient. So, what I would change is such. An idea that I have for this year is utilizing the HTTP archive.org website more as a way of monitoring month to month trends of how the metrics are changing. And prospective authors. And contributors can just go to the website and look at the way things are evolving. And be able to start formulating a narrative or some sort of topic of discussion that they could come up with in the chapter. And they don't need to actually write any SQL to do that, they could just check out the current trends. And the queries will be pre written for you, you don't have to go and write them. And hopefully that will speed things up. Right now, it's a multi month process, it takes like six or seven months, from creating the teams to actually publishing it. And a lot of that time two or three months of that is analysis. So, if we could compress the analysis part, then hopefully, authors can start writing content sooner, and we can publish more quickly, maybe, you know, September, as opposed to November, December.
27:24
Sia: Not to scare anyone that's not full-time work. It's just you know, it's open source. And the back and forth.
27:31
Rick: Yeah, there's a lot of time downtime, it also depends on your role. If you're the analyst, you might not be doing so much of the planning phase, you're obviously involved in the analysis phase, but you're not so much involved in the writing phase. Authors are involved in the planning and the writing, obviously, and so are the reviewers. So it does depend on what you're doing, and how many chapters are contributing to the scope of the chapter, how many people you're working with, on your chapter. Sia can talk maybe about like, sometimes the more people you have creates more work, because you have more people to manage and more opinions to- or more feedback to field. But if you're working alone-
28:09
Sia: Also, bad.
28:10
Rick: Maybe you get less feedback, and maybe you have more control over the chapter. So, it's a weird balancing act where it's kind of unintuitive. Yeah, definitely not to scare anybody, you can do as much work as you want however much time you have to contribute. We're happy to have you. That's- that's one wish fulfilled by God, I'm gonna open it up to all the other contributors if you had any other ideas.
28:32
Sia: Nishu, I think you have a wish.
28:34
Nishu: Yeah. So, I was speHenring, there's this one idea that came to my head. And it was, we should have the genie suggest us different queries, different ideas for queries every day of like, half of the year when we are not working on the Web Almanac that should be the time utilized for thinking of ideas, because this year, I thought there was a lot of data, but so less queries that, you know, I wrote or I use the data for. There was so much information that I could get from that. And you know, think of other content there. It was mostly the, you know, the similar content as compared to the last year, there was just few new sections. So, we could, I think, introduce a lot more sections if we had, you know, the queries thought before. So, the genie should do that for us, I think.
29:21
Henri: Hello, Genie. I should have the genie here. You know, part of the AMA. Sia, did you want to say something for the third wish?
29:32
Sia: No, I was- you keep talking about the genie. I think the genie is Rick. I don't know if he wants to do that.
29:40
Henri: That. That'd be interesting. I mean, Lucas, I hope that answered your question. And in fact, Lucas, I feel like you should be signing up for some- for some writing here. Hello, HTTP three. Do I- Do I get a vote for that? Can I- Can I do a Q&A like a poll? Let's go. Let's go Perdue. Mr. Perdue, let's go you heard the answer. Let's get to another question here, Dimitri. Demetrius, pardon me, and I'm gonna say Demetrius G. Is there a way to extract from websites watched? Any stack changes, like CSS js, CMS is etc in the course of the past year? Obviously, I'm after the current trends. So, looking at trends for CMS js and see I guess, so.
30:31
Sia: He means like switching maybe like, oh, this many react sites switched to Angular, or something like that, I'm guessing because we already have some of those. Right. Some- we already have some of the market share things. So, I'm thinking he's thinking actually switching, I don't know, if Dimitri you are still in the chat, you can tell it.
30:51
Barry: So we have a lot of those queries. And a lot of that's actually quite cheap to query, there's a- there's a technologies table, that's once you get over your fear of BigQuery and go in there. And I think one of the good things about Almanac is all the queries are easily accessed. So, if you see a stat in the JavaScript chapter, or the Web format chapter, there's a little dot-dot-dot three menu, you can click on there, get the query, you can change the date and say, I want to look at it now. If you know SQL, you can change a bit to get the whole trend going there. So that's part of as Rick said, the aim of it is to surface that data and those queries to people so they can go in and answer their own questions there. I think as Rick says it'd be great if we could do that for you on the issue, the archive.org website, so you don't just get a snapshot once a year, you can even go through it all the time, in deeper dive into what that means once a year.
31:45
Sia: I think he meant less a share and the changes and total shares and maybe like for a particular website, how many of them switch their stack? Because that's something that's not currently answered, but would be interesting, but also, I think, is an expensive query. I'm thinking.
32:02
Henri: It would be interesting, actually. Because you'd have to pull the switch. So, you have to pull, you know, prior to particular date and post, I don't know.
32:15
Barry: I don’t know how useful that is to say, “Well, maybe if you see a big swing from react to view” and say, okay, we can see that's- that's where that's going to come, For me, the total of 30% React, 30% view and exchanging that way is more interesting, I think, than, x number of sites are moving, because also our- the websites we measure is changing every month. It's based on Crux, based on what sites popular. Now there's going to be a large portion of that is going to be in every month, but particularly the tail end, which probably the ones that are more likely to switch are changing quite a bit month on month. So, we totally try an awful lot rather than give absolute values and Almanac, we give percentages and say “X percent of the of our crawl is using React rather than 10,423 sites are using React because that number isn't that meaningful as the number of sites are changing every- every month.” No, I don't know. I mean, if you're looking for a particular site or anything, you can do that. And maybe I'm wrong. Maybe it is more interesting than I thought. But to me, yeah, I'd say looking at- part of the point of HTTP Archive Almanac is to look at the internet as a whole rather than individual sites or things. But I don't know, maybe I just haven't thought of a good way of querying it.
33:30
Henri: Well, Dimitri, sounds like there's room for you on the team, you know, to ask some questions. I'd like that. Pardon me? Go ahead.
33:40
Rick: I- Yeah. So, I pasted a link to the chat to discussion thread on the forum for HTTP Archive. And Multo was asking about pretty much the exact same question in terms of the addressable market of web development. And I think it's a really strong signal for, for us to know that a website, you know, went from no CMS to WordPress or something like that, to know that these are people who care about keeping their website up to date, or these are people who if we see that they're optimizing their images, and maybe that's a signal that they care about optimizing user experience and core vitals. So, it's- it's a useful signal. And it's actually something that I'm trying to better understand right now. Like it's a project of mine this quarter for work. It's, can we do this? Like, is this something that the HTTP- HTTP Archive could help with? And I think Barry had a great point about the caveats to that with the turn of the websites in the data set, because it is based on real chrome usage. And so, if people you know, visit one particular website more or less than a certain month, it may or may not be in the dataset. But...
34:57
Sia: Yes, I think the data points would go down because you-
35:00
Rick: Yes.
35:02
Sia: It would only be the ones that have data in both sites. Yeah.
35:04
Rick: Yeah, it is tricky. But I think so the question is, is there a way? So yes, short answer is, yes, we have the ability to do it, it's going to be a complicated query. And if you're interested to follow along, the link that I pasted in chat is a good place to look.
35:18
Henri: Awesome. And thank you very much for that link. And Demetrius, if you're in the chat, I hope you do enjoy the link. And I hope you- you got an answer out of that. But by all means, do contribute in the chat. If you do have some more questions. We have some people looking after the chat as well. We've been talking about, you know, a lot of the queries that are being done. And I know, Barry, just sort of mentioned the fact that with every- beyond every chapter, like literally, every chapter in the chapters, you have the queries available for you to sort of poke around yourself and play around, and, you know, blow your check or on- on a query and end up with this massive AWS bill if you're not careful. But you know, there's, you can actually go in there and really remix the queries yourself. And- and sort of extract some of the assets that- that they've been collecting themselves as well in the- in the almanac, and again, just tweHenring them, and you might get some of the answers that you've been looking for. But speHenring of the queries and whatnot, and this is something I want to ask Rick, a lot of the almanacs data is obviously gathered with BigQuery. I do also understand that WebPageTest is sort of running under the hood, did you want to share with the audience how that sort of runs.
36:53
Rick: I like to call Web Page Test the engine that powers HTTP Archive. So when we have millions of tests, the data that we get comes directly from Web Page Test, it is powering not only the direct insights that come out of the HAR file, which incidentally stands for HTTP Archive, coincidentally. But also it runs Lighthouse and Webalizer. Under the hood, inside a Web Page Test, and so that gives us a lot of the insights that we're able to then query using something like BigQuery.
37:31
Henri: Awesome, awesome. And, you know, you'd mentioned the- the millions of site, I believe this year was like 8.2 million, is that correct? In my office? Yeah. Am I off by one or 2 million? Let me know.
37:43
Rick: It's around.
Henri: Just double checking. Let me see, what else do we have in the chat here? Let's see what Sergei has got to say, “Was 2021 exceptional? Or do we see the continuation of existing trends?”
37:58
Rick: I mean, it depends on what you're asking about, because the web is such a diverse thing. I got this question before. And I should have seen it coming. And somebody was like, okay, so what is the state of the web? That's like, I can't answer that in any one way. If you're asking about performance, we saw, you know, like, core vitals got a little bit better this year? Did the web get better? Barry actually wrote a great blog post about this on the performance calendar about like, did the metrics change? Did websites get better? It's kind of hard to know, the answering the why questions are very hard, we can measure the way things look, a B testing is very hard to do on millions of websites. So we could just kind of correlate things. I will say that the web moves very slowly. And what we see year over year is more of a gradual change. We rarely see things that are kind of like sudden jumps unless they're errors in our methodology or something very strange is happening.
38:57
Sia: I will say though, the exception is like specific things. Like for example, the image lazy loading, because web page has implemented it and so we did see this huge jump and just one thing because like, platform implemented it, so those things are interesting.
39:13
Henri: And you know, Sia you mentioned a platform implement- implemented it. Did you have any kind of- you know, what, what was like the biggest major find that you sort of discovered in writing the, in authoring the performance chapter.
39:32
Sia: Was that that WordPress linkage, I think was one and then what were some of the others, like I have to remember what I wrote. Um, yeah, it was like [inaudible] looked better, but there's also it's really hard to tell, like Barry mentioned, or actually he had that follow up blog post in December, which is really good, because a lot of some of the calculations are how they're how they're counting. accelerated change. Sounds like it might be better. There are certain things that- were that did look better over time, but I wouldn't say it's necessarily- I mean, coming back to the question, was it exceptional? I don't necessarily think so. If we think about like, why something might be exceptional in 2021? Maybe because of the global pannacotta. But um, I don't know that kind of data would be reflected- I mean, that those changes will be reflected in the data. So like, maybe there's more traffic or something, but like, the nature of that traffic necessarily hasn't changed, or the websites themselves? Or maybe more people went online? Like how it- I don't know, I'm trying to think of like or hypothesize what, what would those impacts be and would we have seen something exceptional from them? I don't know. Oh, and then we needed more interactivity metrics, better ones. But that's actually something that's currently being worked on. So yeah.
40:53
Henri: Awesome. I just heard someone say something. If not, I'll jump in.
40:57
Barry: I think Rick says the web moves slowly. You know, individual sites move quite quickly. But the web is huge. And there's an awful lot of sites there that just have been up there and not been- maybe not being maintained as properly is the bad analogy. Because you know, they're a hosted site there- they're been upgraded, being kept up to date, they've has HTTPS that sort of thing. But doesn't mean that they're changing the way they're built, or there's new content going up there. It's not- it's not changing as much. I think, to me to the bigger impact this year was probably what we looked at. So obviously, core vital has been a big thing. So, I think nearly every chapter covered that. Some point, I think the- the ranking factors you mentioned earlier was another big thing that was across the chapters. But I think we changed more so than, say, the web. But then there are individual examples that lazy loading ones is a great example. That has massively changed over the time.
41:54
Sia: I think this is where Demetrius' question was actually interesting. Like, we might be able to see. Like, if you looked at one website and tracked it, of course, there's all the caveat from limited data. But that would be interesting, because those are the things they're like, this definitely changed- change like a framework, or CMS or something like that. Where it's like, you might not see it in the averages. But you might see it in tracking one website, because maybe like 20% of sites moved to this, and another 20 moved away from that, but like the total content has changed the same. But the change itself is actually interesting.
42:28
Henri: Actually, what I thought was interesting is you're seeing, actually, the middle bit of the web is quite often changing more, you expect the top end, you know, they've got all this money with Facebooks, Google's, they can hire as many web developers as they want. And that's where you expect the change to happen. But actually, a lot happens with the kind of the next section going on there. Because they're on services like CloudFlare, that is being self-managed, and they get HTTP3, they get TLS 1.3. And, you know, automatically just by not having to do anything, they've been behind one of these services. So, they were actually ahead in some of those trends and topics, whereas the top end, again, I guess, they've got more to lose by- by changing things, and so on. They obviously got something that has worked and that didn't stop. I think that's kind of interesting.
43:14
Henri: It's- it's interesting, because you'd mentioned that, you know, the rep- web was moving slow. And part of what I was thinking of is, you know, there is a lot of like legacy, you know, online, and- the it's probably, you know, that's part of what potentially might be slowing things down, say, like, on a whole when you sort of scan 8- 8 point 2 million sites. And yeah, I mean, that made me think of, you know, a couple tweets I saw, I think yesterday, as usual, you know, trying to malign jQuery, which now brings me back to the JavaScript chapter. I mean, Nishu, did you? Was there any kind of like, wow-moment in- in your sort of writing of the chapter? Was there anything that caught your attention that you know, made you realize, like, wow, this is actually kind of crazy made you want to double check your query, make sure it was written, right. You know, and then you realize, like, actually, this is what's happening.
44:13
Nishu: Yeah, thanks for the question. Actually, there are two such instances, right, where this happened first, and talk about the community's reaction. And then I'll talk about the query bit where I actually went confused. And I was like, “Oh, maybe I don't know, BigQuery enough. And my query is all wrong.” And that's why there's this significant changes in this number. So, the number one was when Rick posted on Reddit, I think it was about the usage of jQuery. And you know, the usage of frameworks in general. And it just got a lot of traction because people went crazy about that 80% of websites still use jQuery and Frameworks are nowhere [inaudible] all about Frameworks. Where is React? Where is Angular? Why is this just jQuery? So, this was a very interesting reaction. For me also in the beginning when I was writing it and when we saw the measurement of the data, and then the community- the community's reaction totally, because nobody would expect that jQuery still, it's not like 50%, right? It's not half of it, it's still like, “Okay, we would accept it”. But 80% is just a crazy number. So that- that was a very interesting anecdote. I think. That's from the community's point of view.
The other interesting instance was when I was analyzing the usage of async and defer attributes on the same script element. So, I was trying to compare it to the previous year- right- previous year’s percentage, it was 11.4% of mobile pages that use the async and defer attributes on the same script element. And the pages were found to have at least one script element that uses. Now, this year, when I checked that there was a different difference in the query that I was using. And when we checked that stat using that query, it was 35.4. And I was like, this cannot happen, because there should definitely be some improvement. If not, you know, a major improvement. It could be just 11.3, from 11.4. But it was 35. And I went back to recognize, like, “Oh, I think this is wrong, can you take that.” And we realized that the query that was used last year, it was on the initial- initial page load, which means the script elements that showed up there, were having that percentage, and that's why it was 11.4. The query that we use now was based on the custom metric that we had written and the custom metrics are run after the page has rendered, right. So, there was a lot of data that was dynamically loaded. And that included those script elements with both getting away very technical, I guess. But so, after the dynamic load of the page, the two attributes appeared to be a lot more and the percentage went to 35.4.
46:58
Henri: Wow.
46:59
Nishu: So that made sense. And then we dig deeper, we dig deeper into it, and found that there were many analytics types of websites that were dynamically injecting that kind of data. So that then finally justifying that, you know, that significant change.
47:14
Henri: Wow. And, you know, as Rick said, at the very beginning, it's about discovering how the web was and is being built, you know, and, you know, we can see these trends from developers present and past, you know, as you mentioned, as we sort of refine these- and tweak these- these queries, and really kind of like, get into the entrails and go through the details. Let's get into a few more questions here. I want to know, actually, for Rick, we're at 24 chapters this year, is there a particular chapter you would consider adding? Or is this one that's sort of you know, been, you know, you've been mulling over?
48:00
Rick: The short answer is, the chapters that make it into the publication are suggestions from the community. And so even if I had like, I have a great idea for a chapter and then it just comes into existence, we still need the people to author it and analyze it and review it, we need the subject matter experts who know that topic. So, one thing that actually I would be more interested to see, it hasn't been written about before, but there has been interest from the community is on sustainability. And like the green web, this is, you know, a huge topic for globalization purposes. So, I would love to see that this year, if there's enough interest, we can maybe drop a link into the chat for the interest forum that we have where people can sign up and say, I'm interested to do either authoring, reviewing, analyzing, and you can specify like, what topics you're interested to participate in. And sustainability is one of them. We're going to have an issue for it. On the opposite of web three. Yeah, that's, I did see also a question on web three. Maybe it's worth talking from Sergei, don't worry. Maybe it was in jest, but I think it's like, I would love to talk about everything that has to do with the state of the web, whether it is fringe, whether it is emerging technologies that aren't quite fully baked yet, like we have a chapter on capabilities. There are like point 003% of websites that use some of these capabilities. But it's important still to track these things. So that way, we have a reference point for in 10 years when some of these things are on 80% of websites or whatever it might be, we have the point of reference to go back and say, wow, it started out at zero and look at the growth we can track that change over time. If web three becomes a thing I would like to be looking at it from an early perspective and have the experts weighing in on IT experts for and experts against like I'm sure it's a divisive issue. I don't want to have a bias thing written by the Web Three enthusiasts that are totally like crypto. I'm not- I'm not a crypto guy. I'm gonna say that right now. I still want to hear from the people who are and think “Why is this a huge thing?” And balance that with the perspectives of peer reviewers, this is the importance of having, you know, a transparent methodology grounded in real data. So, we need to be able to query the data set to come up with some, I call it like “Kernels of Truth”, and build a narrative around that. And if its purely opinion based, then it's not really a good fit for the webinar to begin with. But if there is something data driven, that we can talk about, then yeah, I absolutely want to hear from people about that sustainability is one thing, and on the other side, maybe burning the rainforest is the other but...
50:41
Henri: Demetrius is here, and he said chapter on NFTs. I don't know.
50:47
Rick: Whatever.
50:48
Sia: I think Web Three should be a chapter. I mean, not- That doesn't mean I advocate for Wednesday, but it's like, but is a new technology. And it's like understanding the state of the web.
50:58
Henri: You said, No, that's a joke.
50:59
Sia: NFTs. Yeah, I think Web Three in general, there are other things that come along you know, like, Rick said-
51:13
Henri: Someone on Twitter.
51:11
Barry: We get the, I can't remember what it was exactly. Rick you might remember. But there was a comment on Twitter saying that some of the technologies are more in there, like CSS and JS is only used by [inaudible] chapter. And the person I think that might be involved and used it, felt that was derogatory. Whereas when- you've got capability, but we give it a whole new chapter, even though it's used only by naught point naught 11%, or JAMstack is only used by naught point 2%, and we pick it up. And there was a feeling there, it had already been there. And I think that it is important to look, as Rick says, at the maybe niche now, and maybe new things there, but also the staples. Like, I think we'll always have a CSS chapter, we'll always have a JS chapter. But you know, those ones. I think some of the other ones might come and go. We lost a couple chapters, gained a couple of chapters, and we might not have to say ones every time. But to Rick's point, we need the chapters staffed. So you know, if we got- no one wants to take the JS chapter next year, it's difficult to do- to do that one, if no one's can step up to it. If someone's got a great idea for sustainability, or for something else, or for Web Three, or whatever, make our suggestion, and if there's enough interest, and we will gladly take it.
52:24
Henri: I mean, I'm of two minds there. But I do agree that, you know, if something's buzzing enough, it does deserve some attention, you know, and you can look at the data and figure out like, is it worth all this attention? And, you know, sort of resources or whatever it may be that that-
52:43
Barry: That’s the point. See, someone just made the point in the chat is, have we got the data, because we talked about a design chapter, I think last year, and, you know, we want it to be a data driven report. It's not just an opinion piece. So, we need to think about, can we get enough data here to make a chapter out of it as well. But again, that's what's good for brainstorming, you know, sit there, say, “I want a sustainable chapter. These are the sorts of things I would look at like JS and the time spent executing it, the size of images. Again, some of that's covered in other chapters, we're bringing it together with a green view, and maybe it's a good thing. If it's just going to be repeating a lot of things or we're gonna have one query then try and make that data cover two or three or 10 pages, then it's a bit- is there enough there for these archive covers.
53:30
Henri: It's the, you know, I'm not sure what is going to be called Green Web or Green Tech, or whatever it may be. But over at the WebPageTest blog, actually, our last or and most recent blog post was about that that very much topic, which is sustainability, and, and sort of emissions and whatnot. And that said, I'm actually going to plug it here as well. I'm actually having a meetup on the very topic, we have three very distinguished speakers, we're going to discuss the idea of resource use, and sort of like the emissions that emerge from some of these applications that were using. Again, it might be, you know, a very small portion individually. But you know, you scale that globally to people being on their phones all the time, and just resources being used. And, and again, that's where the Web Three might come in, because, you know, the, the commentary is that they are very resource intensive, you know, so that's something to look at as well. We're pretty much running the corner to the end. I wanted to sort of open it up to the group and, you know, sort of like one last statement about the almanac. You know, maybe talk about how they can join the team, inquire being a writer for 2022, whatever. I'm definitely going to open it up. Whoever wants to start.
55:11
Rick: I guess I could start. My closing thoughts are basically, the Web Almanac is community driven. It's by and for the community. So, if you're interested to help out, if you have ideas, whether or not you have the time to contribute, we'd love to hear from you just drop in a comment on any of the issues on the repository. It really starts from thinking back to 2019, when the project first started, I sent out a tweet and I was like, I think it'd be great if we had this type of resource. And there was a publicly editable Google Doc, and everybody just kind of poured in and collaborated on this doc. And that, like, was a huge inspiration for me that such a project could be viable, like the community cares enough about the web, as an ecosystem to drive this sort of thing. So please help keep that going. I think it's a great public resource. It's great to see all of the different derivative content that comes out of it, like from academic research, papers, citations, and books and conferences and blog posts. So please keep that up. And let us know if you'd like to contribute.
56:17
Henri: Amazing. Nishu...
56:21
Nishu: Right. So when- when I was looking for analyst for, for the chapter, when you're looking for somebody to contribute there, I saw that there was a resistance there, right in the people, because they have not sure what were the skills that were needed there. And I- I somehow wanted to, because I knew that now after coming, becoming an author, right, I knew that this is, it's not like you shouldn't be a Bitcoin expert, for example. It's not like you should know what the data needs to be in. That's when you should jump in. Right. So, I think I would like to clear that mis- misconception that it's just about the idea that this is what you're trying to achieve from the data. This is the data, this is the, you know, for example, the metrics that you want to get out of it; it becomes very easy after that. So- and even if you don't have any idea, the last stage is, let's not call it the last stage. But I think there's still scope for mHenring becoming an editor, right, which leaves us with a lot of learning and growth. I- When- when I looked at the comments from my reviewers, and editors, I felt like “Oh, there's really a lot of scope for other people who could, you know, jump in, and they have to go through and read the whole stuff”. So, there's so much learning there. So, if not contributing to the community, there's so much learning, we should just think of it as our growth personally. And you know, and study it so yeah.
57:45
Henri: Awesome. Thank you for- Thank you, for that. Sia.
57:52
Sia: Parting thoughts time? I'm trying to think of something that hasn't already been said? I would echo that a lot of those- definitely consider not author things, too. Because, yeah, I needed more help with analysts. Where again, you don't have to be like a perfect expert. A lot of queries are there from last year, so you just reuse them or maybe you just tweak, you know, one little part of it. So... And actually, for all of it, you don't necessarily have to be an exp- Like, you don't have to be the perfect writer. There's reviewers and editors there for that reason, and there's tools to make you better at that. But yeah, don't- don't think you can't contribute, you can definitely contribute. It might have been a hesitancy on me for like some of the previous years I didn't, because I was like, “Well, I don't know if I'm, you know, smart enough yet.” But it's kind of silly, on hindsight, thinking of that.
58:50
Henri: Awesome. Awesome. Well, I'm glad that you were able to join the team and then contribute to that, an authored in fact the chapter. So, thank you very much for that. And, Misuer Barry.
59:01
Barry: I would echo again, I say if you- if you like the Web Almanac, feel free to get involved. And again, to Nishu’s point you don't need to be an expert. I mentioned earlier, we like the authors to be experts, to be honest. But if you- you know you're a junior web developer, and you'd like to get involved here, absolutely. Or project managers, we always need those, analysts we definitely need more. So, if you know even a tiny bit of SQL, or you want to learn, you've always heard about this HBr dataset or this BigQuery dataset and you want to learn how to query it. Let us know. We will cover the costs for you querying it and help you get on board there. I think the team are great about helping newcomers coming on there. So don't be afraid and jump up and come up with the ideas as well; the new ideas. I don't want the Web Almanac to get stale and be 2022 is a repeat of 2021. We try our best to make sure the chapters aren't just an update but give something new or new interesting insight. So, we like to get different authors each year. We like to get different topics each year and different views on the ranking you know, something new that an author is bringing. So, we need people and we need ideas.
1:00:10
Henri: And so, in closing, I do want to thank everyone, for spending some time with us for the last hour, this is something that you will be seeing a lot more from us in terms of just, you know, sharing and working with the community doing these kinds of presentations. Well, specifically, since we are talking about the Web Almanac and this AMA, there are 24 chapters of the almanac, we are not going to do 24 streams. But we are going to bring a few more to kind of get into some of the details. You know, I'll reach out to some of the other author- authors like maybe CSS or Capabilities, or Media or Images, or page weight, whatever it may be CMS, etc. So, look for some more of these almanac AMS, I've been a big fan of the almanac, I've always said, you know, anything Rick can attest to that we met. Whereas they, what can we do to sort of get people to really know about the almanac a lot more, dig into it a lot more. In fact, I don't know if someone can drop a link to the almanac, because I believe there's a PDF as well, you know, which, for some reason I completely forgot. So, if you want to read the almanac offline, you can do that as well.
This is a fantastic trove of information. And, you know, Rick had mentioned, you know, some of the, you know, researchers that quote from it, and you'll hear people sort of like, pick it up from, you know, talks to presentations at a meet up, you know, the Almanac is a storytelling tool. You know, and if you really want to sort of, you know, maybe particularly affect some change at work, and someone wants to say, “Hey, you know, I want to use x tool”, and you might be able to bring the Almanac and say, “I don't know about that, you know, here's what's going on here. There may not be enough support, you know, X, Y & Z”. So please go give the almanac a little, little scrub, little read. It's a fantastic book. They are planning to do 2022. Again, they talked about contributing. So, if you want to reach out to them, maybe hit them on Twitter.
And lastly, you know, the deal. If you're not auditing with- without, you know, if not auditing webpage tests like you know, auditing, right. It's the bottom line. But honestly, you we talked earlier today about sort of like surfacing a lot of data, we can at WebPageTests, go give it a little run, you don't have to register, you know, you could do a little free audit, or you could actually register and get some additional info out of that you'll be able to save your queries, etc. But we can get into those details tomorrow. So, tune in tomorrow for Twitch live. And, of course, please give us a follow-on Twitter if you can, at @RealWebPageTest. We'll be sharing a lot of the details as things come about for the next amas, other presentations we'll be doing throughout the year. Any involvement we'll have in- in the community and as well, any updates with WebPageTests that we make will be able to sort of tease them out through our Twitter account. So please give us a follow.