Introduction

This study tested whether weapon type affected performance in a Minecraft zombie arena. The response variable was damage absorbed per trial, recorded in Minecraft damage-stat units as DamageTaken. Lower values mean the player absorbed less damage while fighting the same group of 64 zombies, so smaller responses indicate better performance. The treatment factor was weapon type with three levels: sword, axe, and trident. The blocking factor was play session with four levels, Sessions 1 through 4. The test was a randomized complete block ANOVA with session used to remove potential variation that might come from fatigue, rhythm, or small gameplay differences.

Equipment used for the Minecraft weapon study

Player character used in the Minecraft weapon study aka… me

Methods

Hypotheses

The significance level for the ANOVA was set to \(\alpha = 0.05\).

For weapon type:

\(H_0: \mu_{sword} = \mu_{axe} = \mu_{trident}\)

\(H_A\): at least one mean damage-absorbed value differs among the three weapons.

For session block effects:

\(H_0: \mu_{Session\ 1} = \mu_{Session\ 2} = \mu_{Session\ 3} = \mu_{Session\ 4}\)

\(H_A\): at least one session mean differs.

The session effect was included as a block to absorb potential variation across sessions rather than as the main scientific question. Because there is only one observation for each weapon-session combination, the model does not separately test a weapon-by-session interaction.

Design

This was a randomized complete block design with three weapon treatments and four blocks. In each session, the player completed one sword trial, one axe trial, and one trident trial, giving a total of 12 trials. The arena, world settings, armor with Protection IV enchantments, difficulty, zombie count, and starting setup were held constant so that weapon choice was the main planned source of treatment variation.

DT::datatable(
  ord.tb,
  colnames = c("Session", "Order", "Weapon", "Damage absorbed"),
  options = list(pageLength = 12, lengthChange = FALSE)
)

Procedure

Use the same enclosed Minecraft arena, same world, the same Protection IV armor, the same difficulty setting hard mode, and the same zombie spawn procedure for every trial.

Minecraft zombie arena used for the trials

At the start of each trial, set the time to night, prepare the same starting conditions, and spawn 64 zombies into the arena.

Resetting zombies and setting it to night between each run

Randomize the order of sword, axe, and trident within each session so weapon performance is not mixed up with warm-up, fatigue, or changing focus.
Fight until all 64 zombies are defeated using only the assigned weapon for that trial.

Fighting for my life without dying

Record the cumulative damage statistic after each run, then convert it to trial-specific damage absorbed so each observation represents one weapon-session combination.

Checking the stats …

Reset the arena before the next weapon trial in that session.

Randomization And Bias

The randomized part of the design was the within-session weapon order, represented by the Order column in the file. That step helps protect the weapon comparison from order effects. The parts that were not randomized were the player, arena, world, and general test setup. Those fixed choices improve consistency, but they also introduce possible bias because the results reflect one player in one arena rather than all Minecraft combat situations. In addition, if session-level practice or tiredness changed over time, those effects could still influence the data, which is why session was treated as a block.

Variance And Limits

Important uncontrolled variation could still come from small differences in movement, timing, zombie pathing, aim, and in-game crowding. Blocking by session helps account for some of that run-to-run noise, but with only one observation per weapon inside each session, the design does not allow a separate test of a weapon-by-session interaction. That means the ANOVA assumes the weapon differences are fairly consistent across sessions.

Analysis

Data Summary

The trident had the largest sample mean damage absorbed at 46.94, while the sword had the smallest at 39.42. Because lower damage absorbed is better here, the sword looked best on raw averages, but the spreads were wide enough that these visible differences needed formal ANOVA confirmation.

kable(w.sum, digits = 2, col.names = c("Weapon", "n", "Mean", "SD", "Min", "Max")) |>
  kable_styling(full_width = TRUE)

Weapon	n	Mean	SD	Min	Max
sword	4	39.42	13.52	21.0	53.20
axe	4	43.35	18.34	24.7	68.38
trident	4	46.94	9.41	34.0	56.50

kable(s.sum, digits = 2, col.names = c("Session", "n", "Mean", "SD", "Min", "Max")) |>
  kable_styling(full_width = TRUE)

Session	n	Mean	SD	Min	Max
Session 1	3	48.01	5.18	42.84	53.20
Session 2	3	41.13	24.48	21.00	68.38
Session 3	3	44.57	10.39	37.50	56.50
Session 4	3	39.27	12.91	24.70	49.30

boxplot(y ~ Weapon, data = d.df,
  main = "Minecraft Zombie Arena: Damage Absorbed by Weapon",
  xlab = "Weapon", ylab = "Damage absorbed (Minecraft stat units)",
        col = weapon.cols)
stripchart(y ~ Weapon, data = d.df, add = TRUE, vertical = TRUE,
           method = "jitter", pch = 16, col = "gray35")

boxplot(y ~ Session, data = d.df,
  main = "Minecraft Zombie Arena: Damage Absorbed by Session",
  xlab = "Session", ylab = "Damage absorbed (Minecraft stat units)",
  col = session.cols)
stripchart(y ~ Session, data = d.df, add = TRUE, vertical = TRUE,
           method = "jitter", pch = 16, col = "gray35")

interaction.plot(d.df$Weapon, d.df$Session, d.df$y,
                 type = "b", pch = 19, lwd = 2,
                 col = session.cols,
                 xlab = "Weapon", ylab = "Damage absorbed (Minecraft stat units)",
                 legend = FALSE,
                 main = "Minecraft Zombie Arena: Observed Session Profiles by Weapon")
legend("top",
       legend = levels(d.df$Session),
       title = "",
       horiz = TRUE,
       lty = 1,
       lwd = 2,
       pch = 19,
      col = session.cols,
       bty = "n",
       inset = 0.02)

The weapon boxplots overlap heavily, and the session boxplot shows that Session 2 has the highest center and widest spread, while Session 3 appears somewhat lower and tighter. The session profile plot supports that same pattern and highlights that Session 2 has a noticeable spike for the axe relative to the other two weapons in that block. Even with these visible block-to-block differences, the session means still overlap enough that the blocked ANOVA did not find a statistically significant session effect.

Assumptions

For the blocked ANOVA, I checked the usual residual assumptions: approximately normal errors, roughly constant variance, and independence across trials. The residual vs fitted plot did not show a strong megaphone pattern, so a log transformation was not needed. The Q-Q plot looked acceptable for such a small sample, and the residuals plotted against trial order did not show an obvious time trend. The Shapiro-Wilk test on the model residuals gave \(p = 0.454\), which does not break from normality.

op <- par(mfrow = c(1, 2))
plot(b.fit, which = 1)
plot(b.fit, which = 2)

par(op)

plot(d.df$trial_id, resid(b.fit),
  main = "Minecraft Weapon Study: Residuals by Trial Order",
     xlab = "Trial order", ylab = "Residual",
     pch = 16)
abline(h = 0, lty = 2)

ANOVA

Because this is a randomized complete block design with one observation for each weapon in each session. The set up includes sessions as a block and weapon as the treatment factor. The weapon effect was not statistically significant, \(F(2, 6) = 0.2\), \(p = 0.823\). The block effect was also not statistically significant, \(F(3, 6) = 0.16\), \(p = 0.920\). In other words, after accounting for session, the data do not provide strong evidence that sword, axe, and trident differ in mean damage absorbed in this experiment.

kable(b.aov, digits = 3, col.names = c("Term", "Df", "Sum Sq", "Mean Sq", "F", "p")) |>
  kable_styling(full_width = TRUE)

Term	Df	Sum Sq	Mean Sq	F	p
Session	3	134.204	44.735	0.159	0.920
Weapon	2	113.178	56.589	0.201	0.823
Residuals	6	1688.538	281.423	NA	NA

Follow-Up

Because the overall weapon test was not significant at \(\alpha = 0.05\), I did not run multiple comparisons. Reporting pairwise differences after a clearly non-significant omnibus test would add noise without adding defensible evidence. On raw means alone, the sword had the smallest observed damage absorbed and the trident had the largest, but those average gaps were small relative to the within-weapon variability.

Case Study 3: Minecraft Zombie’s Weapon Comparison

Wil Jones