This study tested whether weapon type affected performance in a
Minecraft zombie arena. The response variable was damage absorbed per
trial, recorded in Minecraft damage-stat units as
DamageTaken. Lower values mean the player absorbed less
damage while fighting the same group of 64 zombies, so
smaller responses indicate better performance. The treatment factor was
weapon type with three levels: sword, axe, and trident. The blocking
factor was play session with four levels, Sessions 1 through 4. The test
was a randomized complete block ANOVA with session used to remove
potential variation that might come from fatigue, rhythm, or small
gameplay differences.
The significance level for the ANOVA was set to \(\alpha = 0.05\).
For weapon type:
\(H_0: \mu_{sword} = \mu_{axe} = \mu_{trident}\)
\(H_A\): at least one mean damage-absorbed value differs among the three weapons.
For session block effects:
\(H_0: \mu_{Session\ 1} = \mu_{Session\ 2} = \mu_{Session\ 3} = \mu_{Session\ 4}\)
\(H_A\): at least one session mean differs.
The session effect was included as a block to absorb potential variation across sessions rather than as the main scientific question. Because there is only one observation for each weapon-session combination, the model does not separately test a weapon-by-session interaction.
This was a randomized complete block design with three weapon treatments and four blocks. In each session, the player completed one sword trial, one axe trial, and one trident trial, giving a total of 12 trials. The arena, world settings, armor with Protection IV enchantments, difficulty, zombie count, and starting setup were held constant so that weapon choice was the main planned source of treatment variation.
DT::datatable(
ord.tb,
colnames = c("Session", "Order", "Weapon", "Damage absorbed"),
options = list(pageLength = 12, lengthChange = FALSE)
)
hard mode,
and the same zombie spawn procedure for every trial.The randomized part of the design was the within-session weapon
order, represented by the Order column in the file. That
step helps protect the weapon comparison from order effects. The parts
that were not randomized were the player, arena, world, and general test
setup. Those fixed choices improve consistency, but they also introduce
possible bias because the results reflect one player in one arena rather
than all Minecraft combat situations. In addition, if session-level
practice or tiredness changed over time, those effects could still
influence the data, which is why session was treated as a block.
Important uncontrolled variation could still come from small differences in movement, timing, zombie pathing, aim, and in-game crowding. Blocking by session helps account for some of that run-to-run noise, but with only one observation per weapon inside each session, the design does not allow a separate test of a weapon-by-session interaction. That means the ANOVA assumes the weapon differences are fairly consistent across sessions.
The trident had the largest sample mean damage absorbed at 46.94, while the sword had the smallest at 39.42. Because lower damage absorbed is better here, the sword looked best on raw averages, but the spreads were wide enough that these visible differences needed formal ANOVA confirmation.
kable(w.sum, digits = 2, col.names = c("Weapon", "n", "Mean", "SD", "Min", "Max")) |>
kable_styling(full_width = TRUE)
| Weapon | n | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| sword | 4 | 39.42 | 13.52 | 21.0 | 53.20 |
| axe | 4 | 43.35 | 18.34 | 24.7 | 68.38 |
| trident | 4 | 46.94 | 9.41 | 34.0 | 56.50 |
kable(s.sum, digits = 2, col.names = c("Session", "n", "Mean", "SD", "Min", "Max")) |>
kable_styling(full_width = TRUE)
| Session | n | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| Session 1 | 3 | 48.01 | 5.18 | 42.84 | 53.20 |
| Session 2 | 3 | 41.13 | 24.48 | 21.00 | 68.38 |
| Session 3 | 3 | 44.57 | 10.39 | 37.50 | 56.50 |
| Session 4 | 3 | 39.27 | 12.91 | 24.70 | 49.30 |
boxplot(y ~ Weapon, data = d.df,
main = "Minecraft Zombie Arena: Damage Absorbed by Weapon",
xlab = "Weapon", ylab = "Damage absorbed (Minecraft stat units)",
col = weapon.cols)
stripchart(y ~ Weapon, data = d.df, add = TRUE, vertical = TRUE,
method = "jitter", pch = 16, col = "gray35")
boxplot(y ~ Session, data = d.df,
main = "Minecraft Zombie Arena: Damage Absorbed by Session",
xlab = "Session", ylab = "Damage absorbed (Minecraft stat units)",
col = session.cols)
stripchart(y ~ Session, data = d.df, add = TRUE, vertical = TRUE,
method = "jitter", pch = 16, col = "gray35")
interaction.plot(d.df$Weapon, d.df$Session, d.df$y,
type = "b", pch = 19, lwd = 2,
col = session.cols,
xlab = "Weapon", ylab = "Damage absorbed (Minecraft stat units)",
legend = FALSE,
main = "Minecraft Zombie Arena: Observed Session Profiles by Weapon")
legend("top",
legend = levels(d.df$Session),
title = "",
horiz = TRUE,
lty = 1,
lwd = 2,
pch = 19,
col = session.cols,
bty = "n",
inset = 0.02)
The weapon boxplots overlap heavily, and the session boxplot shows that Session 2 has the highest center and widest spread, while Session 3 appears somewhat lower and tighter. The session profile plot supports that same pattern and highlights that Session 2 has a noticeable spike for the axe relative to the other two weapons in that block. Even with these visible block-to-block differences, the session means still overlap enough that the blocked ANOVA did not find a statistically significant session effect.
For the blocked ANOVA, I checked the usual residual assumptions: approximately normal errors, roughly constant variance, and independence across trials. The residual vs fitted plot did not show a strong megaphone pattern, so a log transformation was not needed. The Q-Q plot looked acceptable for such a small sample, and the residuals plotted against trial order did not show an obvious time trend. The Shapiro-Wilk test on the model residuals gave \(p = 0.454\), which does not break from normality.
op <- par(mfrow = c(1, 2))
plot(b.fit, which = 1)
plot(b.fit, which = 2)
par(op)
plot(d.df$trial_id, resid(b.fit),
main = "Minecraft Weapon Study: Residuals by Trial Order",
xlab = "Trial order", ylab = "Residual",
pch = 16)
abline(h = 0, lty = 2)
Because this is a randomized complete block design with one observation for each weapon in each session. The set up includes sessions as a block and weapon as the treatment factor. The weapon effect was not statistically significant, \(F(2, 6) = 0.2\), \(p = 0.823\). The block effect was also not statistically significant, \(F(3, 6) = 0.16\), \(p = 0.920\). In other words, after accounting for session, the data do not provide strong evidence that sword, axe, and trident differ in mean damage absorbed in this experiment.
kable(b.aov, digits = 3, col.names = c("Term", "Df", "Sum Sq", "Mean Sq", "F", "p")) |>
kable_styling(full_width = TRUE)
| Term | Df | Sum Sq | Mean Sq | F | p |
|---|---|---|---|---|---|
| Session | 3 | 134.204 | 44.735 | 0.159 | 0.920 |
| Weapon | 2 | 113.178 | 56.589 | 0.201 | 0.823 |
| Residuals | 6 | 1688.538 | 281.423 | NA | NA |
Because the overall weapon test was not significant at \(\alpha = 0.05\), I did not run multiple comparisons. Reporting pairwise differences after a clearly non-significant omnibus test would add noise without adding defensible evidence. On raw means alone, the sword had the smallest observed damage absorbed and the trident had the largest, but those average gaps were small relative to the within-weapon variability.
This experiment did not find a statistically clear weapon effect in the Minecraft zombie arena. Within the four blocked sessions, the sword, axe, and trident produced similar damage-absorbed outcomes once normal run-to-run variation was taken into account. The sword had the best raw average, but not by enough to conclude that it truly outperformed the other weapons in this dataset. A stronger follow-up study would add more sessions, keep the same blocking strategy, and possibly collect a second response such as completion time so weapon performance can be judged from more than one angle.