Measuring Homestuck
Word Count, Update Frequency, and more
Redditor? See the r/Homestuck page for this post here.
This report is completely spoiler-free as long as you don't follow any links to HS pages.
Plenty of older threads attempt to speculate and/or estimate the "size" of Homestuck, or provide comparisons to popular benchmarks (Bible, LoTR, etc), or present metrics based on hand-collected data (update frequency, etc). This thread is an attempt to address the "how big is Homestuck" question (and others) with automated data ripped from the adventure, using established and experimental methods for equivalence and comparison to works in other media. Additionally, the exploration of Homestuck data for interesting trends and statistics is encouraged, because everyone loves graphs. EVERYONE.
I plan to update the underlying data source used in this post (and linked below) at the end of each act or sub-act. Currently, the collected data spans from the beginning of Homestuck to the end of A6I2, on this page. At the start of A6I3 or A7(?), I plan to officially update the data source in this post, and all dependent metrics/figures. Other users are free (and encouraged) to download, edit, expand, alter, redistribute, and use the linked Excel file(s) for their own exploration of Homestuck's data, and interesting finds or alterations posted in this thread will be added to the title post.
Without further ado:
Word, Page, and Media Counts
Number of words in Homestuck's text: 418,290
This metric includes:
- Page Title Text (e.g. "John: Quickly retrieve arms from drawer." = 6)
- Inner Page Text (e.g. "Your ARMS are in your MAGIC CHEST, pooplord!" = 8)
- Pester/Sprite/Dialog Names & Text (e.g. "TG: ok i can accept that" = 6)
It DOES NOT include:
- Text in Images & Flash (e.g. "ZOOSMELL POOPLORD" on this page)
- Next-Page Commands (which are repeated in following Title Text)
- Act, Intermission, and Button Labels (like "Show Pesterlog" and "[I]")
The "==>" and similar commands count as one word, in titles and page text.
Number of Adventure pages in Homestuck: 4,816
This metric includes:
It DOES NOT include:
Number of images in Homestuck: 5,512
Number of image frames in Homestuck: 21,163
This metric includes:
- Single Adventure Page Images (e.g. page 1901 has 1 image)
- Extra Images Per-Page (e.g. page 1907 has 2 images)
It DOES NOT include:
Frames are extracted automatically from each GIF file and counted regardless of duration.
Number of Flash Animations in Homestuck: 135
This metric includes all Flash files on Adventure pages, regardless of the [S] tag.
The following figure is a look at word count on a per-page basis, binned logarithmically.
This histogram shows us the largest bin of word count is the 33-64 range, with 854 pages. A very-close second is the 0-1 range, with 853. Pages with this few words are almost-always "minipages" like this one, with only a single-word title (usually "==>") and no body text. You can also see there are two pages in the "Extreme Outlier" bin of 4097+ words, which (as you might have expected) are the "recaps," this page and this one.
The following figure shows how word count grows over the course of the Adventure.
Small vertical jumps can be observed at 3574, 3888, and 5138, where the recaps are located in the story. This progression suggests a new recap is due soon! The trend overall is very-slightly concave up, implying a mild increase in average per-page word count over the course of the story so far.
Comparisons using these numbers to common literature standards follow in the next section.
Size by Comparison
This section uses data from a number of internet sources, including this page and this one, to compare Homestuck's length to that of famous novels and other common literary standard lengths. I cannot be responsible for the accuracy of data from these outside sources, but expect they are correct to within 10%. Anyone with better sources or data should reply to the thread, and I'll update this section.
This first figure is a comparison on raw word count alone.
Now, while impressive in its own right, this comparison fails to account for the variety of media and storytelling elements in Homestuck as compared to a traditional novel. We can see that, counting only the story words (using the metric explained in the last section), Homestuck is about "Two Crime and Punishments" or "Half a Bible." It is furthermore roughly-comprable in size to The Lord of The Rings trilogy, and David Foster Wallace's novel Infinite Jest. The scale of the story, by word count alone, is pretty amazing! However, a more applicable comparison should take into account the various other storytelling methods in Homestuck, including images and Flash animations; for this sort of comparison, we have to delve into the largely-subjective field of adjusted word count or word equivalence. The ultimate goal of this is to attempt to estimate, as closely as possible, the number of words it would take to convey Homestuck using text only.
While there is by no means a common standard for word equivalence of images, and absolutely no precedent for word-equivalence of Flash animations, this is a best effort based on available information and common-sense considerations, with conservative estimates employed whenever possible for what should be a a "safe-minimum" result. Word equivalence is fairly common in scientific and scholarly journals, which may individually have rules regarding image size and density as applicable to article length restrictions. This presentation by the American Journal of Neuroradiology goes over their particular rules on word-equivalence, which are excerpted in the following list:
- add 100 words to your total word count for every two brain scans or like-sized images.
- Each graph should be counted as 100 words.
- consider image submissions not outlined in this presentation as 100 words.
- arrays of images add (4 images = 400 words)
Of course, a radiographic brain scan presents a great deal more "equivalent information" than, say, this story image. The great variation in Homestuck's image complexity suggests that a flat counting metric may not be the best option, especially when animated GIFs are involved. After a great deal of experimentation, comparison, and screwing around, I've come up with the following metric for word-count equivalence of Homestuck's images:
Nwords=9+0.6*Simg
...where Simg represents the size of each image file, in kiloBytes.
This puts the flashy-cracked-window image above at 16 words (~one 'good' sentence), while more complicated, animated images like this one are more lengthy (this one is ~118). As a good measure for evaluation, think about the number of words it would take to describe every detail in an image which is remotely pertinent to the story: for the first image, the fact that the window has four panes, in cracked, and is flashing with an otherwise blank "screen." Again, we're trying for a conservative estimate. While not every image in Homestuck is well-represented with this metric, the parameters have been selected so as to produce a good average (or final-sum) result. Equivalence on each individual image is not the intent.
For Flash animations, without any information or precedent, I've adopted a similar system:
Nwords=20+0.1*SFlash
...where SFlash represents the size of each image file, in kiloBytes.
This makes the first Strife Page, a relatively-simple animation/game, worth about 70 words. By comparison, Cascade is worth a whopping 5,020. By my totally-qualitative estimate, this metric undercounts the small Flash files (which are many) and overcounts the huge ones (which are very few), which balances to somewhere-near-reasonable-on-average. A better metric for this might include the duration of each animation rather than the byte-size, but I have yet to find a machine-readable way to count the duration of each animation, and wouldn't even know where to begin on the game-like ones such as Myststuck.
In addition to the raw word count listed above, these two equivalence metrics yield an "adjusted word count" or "complexity count" for Homestuck, which should, as intended, represent "the number of words to convey the Homestuck story using only text."
Adjusted Word Count for Homestuck: 728,063
We can retroactively re-apply this number and adjusted counting scheme to our results from before, including the previous section.
This figure is a comparison using the Adjusted Word Count for Homestuck.
Using this comparative scheme, we can say Homestuck is "a little under one Bible" or "over one and a half LoTRs." Again, the parameters and methods used for word-count equivalence are very subjective and estimated, but are all intended to be conservative estimates and measures, based at least in-part on common standards, where available.
Adjusted Size Metrics
Continuing to update our prior graphs, the following adds the complexity measure, along with raw data size, to the accumulating word count line-graph from the first section.
This figure shows how story complexity grows page by page in Homestuck.
We're counting both Words and kiloBytes on the same Y-axis here, which might invite some poor inferences; the green line shouldn't be used in comparison to the others. They just fit so well on the same axe, I couldn't resist. Also, see that huge spike in story size around ~6000? Yeah, that's Cascade. The interesting spikes here-and-there made me want to look at how complexity (read: adjusted word count) is distributed across pages, so I made the following scatterplot:
This figure shows complexity for each page in Homestuck.
The big spikes (labeled with page number) are Recap 1, Recap 2, Recap 3, and.... Cascade. The blue line across the bottom is a simple Linear fit to the data, which shows a net-upward trend in page complexity. Most pages are "down in the grass" with a complexity around 10-100.
Finally, I was interested in the breakdown of Homestuck's size between media types. First off:
File Size of Homestuck: 622.56 MB
Includes text words, story images, and Flash Animations. Does not include page decorations or other HTML outside the comic frame.
This figure shows the breakdown in file size between Homestuck's storytelling mediums.
This chart speaks for itself; a tiny majority of Homestuck's file size is due to text, as compared to images and Flash. This is not unexpected by any means. Still, 2.5MB of text alone is nothing to be ashamed of!
This concludes the sections on content and comparison. Any ideas regarding other interesting methods of analysis or statistics should be posted as replies in this thread (or PMs to me!), and good ones will be added to this post and the shared Excel file.
Update Frequency in Homestuck
The next two plots look at how often, and when, Homestuck is updated.
This figure shows how often per-day Homestuck has been updated, since starting.
Moving averages as calculated center-weighted. The Yearly line (violet) makes a good metric for update frequency after averaging-out spikiness; Homestuck's period of most-frequent-updates is thus around Fall 2010. Notice also the large drop across all series around Fall 2011, when the comic went "on break."
We can also bin the updates-per-day data:
This figure shows the frequency of daily updates in Homestuck.
Seen here, most days (since Homestuck started on 4/13/2009) there are zero updates to the comic. On days that the comic is updated, there are most-commonly 4 updates per day. The largest number of updates in one day occurred on 7/8/2011, with 41 updates.
This concludes my "Measuring Homestuck" report. My Excel file with page data, charts, and calculations is available here, or attached; Homestuck fans are encouraged to edit, alter, append, expand, and use this data and these figures in their own work, and asked to repost any interesting finds to this thread in order to make them public and keep centralized. I only ask that, if possible, my forum username continue to be attached to all derivative works in the "Summary" tab, as it is currently. I would also like to add a very special thanks to Andrew Hussie for writing what might be "the greatest story ever told," which has kept me entertained for 3+ years running.
In Summary:
Number of words in Homestuck's text: 418,290
Number of Adventure pages in Homestuck: 4,816
Number of images in Homestuck: 5,512
Number of image frames in Homestuck: 21,163
Number of Flash Animations in Homestuck: 135
File Size of Homestuck: 622.56 MB
Adjusted Word Count for Homestuck (see above): 728,063
(last Updated Friday, 29 June 2012)
Please post or PM with questions/suggestions!