Down to earth
Into the wild study
For our final iteration of user studies, we turned to an into the wild between-subjects study. The purpose of this study is to research which game mechanics or game controls are prefered by the end users. For our research, we wanted to see how the difficulty of the game influences the “fun factor” for the users. In other words, we want to find out if players experience more fun in playing a more challenging game.
Here is a video showing the game being played. It may help understand what we say below.
To receive a large test audience, we launched our game on the Google Play store, available for everyone to play. However, we launched the game in two versions: a version where the obstacles expand with a constant rate (no speedup) and a version where the expansion rate increases linearly (speedup). The version a player installs on his/her phone is randomly decided at time of download. Note that there is also a second difficulty mechanism in the game. The chance of more than one gap appearing in a level increases also over time. This difficulty mechanism is kept the same in both versions and hence will not influence the difficulty differences.
To receive a large test audience as possible, we shared the download link with student communities via social media. This because students lie within our target users group (aged between 16-30). At the moment of this analysis, we gathered 44 test subjects of whom 20 users downloaded the speedup version and 24 users the no speedup version.
To be able to do the analysis, we had to log various game session variables. The logged data was sent to our server each time the player got a game over or quitted the game. The logged data consisted of:
- deviceID: unique ID of device to identify the user
- difficulty: version of the game
- duration of game: time from starting the game to game over or quiting
- score of game: total score reached in the game
- coins of game: total number of coins collected in the game
Note that we added the score and coins of the game to the log in order to have a complete overview of each played game. We also added a timestamp at the time when a logging request hits the server to be able to track the playing sessions of the users.
The goal of this between-subjects user study is to test if we can reject the null hypothesis and accept our test hypothesis. Our hypothesis says: “The version with speedup is expected to be more ‘fun’, so we expect more playtime on this version”. Hence the null hypothesis states that there is no difference in playtime between the two versions.
So once we had gathered the player data, we started our data analysis. For this, all data was collected into a CSV file and the file imported into R. To statistically prove if the results of the two user groups were significantly different, we used the ANOVA procedure. More specifically, we test if the resulting p-value is lower than a threshold. In this analysis, we set this threshold to 0.05. So if p<0.05, we know we can reject the null hypothesis.
As said before, we start of by testing if the difference in playtime between both version groups is significant. The figure below shows the class distributions of both versions.
After applying ANOVA, we received a p-value greater than 0.05. Therefore, we could not reject the null hypothesis. So we could conclude that our initial assumption that a more difficult game is more fun and therefore induced more playtime was wrong. But more observations might change this conclusion. As more users installed the game and existing users had more time to play the game, the p-value dropped over 10% in the last week. It is possible that with a bigger test audience and over a longer period the p-value could eventually drop below 0.05 and by doing so confirming our initial assumption.
We also looked at other variables and found two that were indeed significantly different (p-value<0.05): the total number of games played per player and the average time played per game per player. The distributions of both are again shown in the figures below.
These results were actually to be expected. The speedup version is more difficult and therefore results in shorter playtime per game in comparison to the no speedup version, where the maximum difficulty is reached after 6 minutes playing. This explains the difference in average playing time per game. Moreover, this entails that in a same timespan more games of the speedup version can be played, which in turn explains the difference in number of played games.
Finally, we looked at the time passed between the first time and last time that a player played a game. This could also give as a clue regarding our initial assumption. It is possible after all that players of different versions do play an equal amount of time but that one player just plays the game when he has just installed it and afterwards never plays it again while the other player plays less at a time but is more encouraged to play it again in the future (more retention). The passed time between the first and last game can simply be retrieved by subtracting the timestamps of the first and the last log of a deviceID. The figure below displays the passed time distributions of both versions.
When doing the ANOVA analysis, we got a p-value less than 0.01. Hence, there is a clear difference in the timespan over which the users of both versions play the game. This strengthens our previous belief that over time, the total playtime will become a significant difference and that we can actually reject the null hypothesis.
Previous results were obtained before the presentation on 15/12/2017. To see if the trend of the p-value remains, we reran the ANOVA analysis. Suddenly, the p-value had risen by 6%. The cause of this sudden rise was that new players had downloaded the game, hence the difference in retention had not yet been able to create a difference in total playtime for these new players. To account for this change we only looked at the players who had played at least once before the aforementioned presentation. Running the ANOVA analysis on the “old” playerbase, we saw that the p-value dropped by another 4%.
Limits to the speedup version
After playing the game some more ourselves, we noticed that the speedup version becomes luck-based instead of skill-based when you play long enough. Due to the linear increase in expansion speed you have to get lucky with the gap-generation, otherwise the game is nearly impossible. With other words, the speedup version gets so hard that people can only improve their highscore by chance. This could make people quit the game. We tried to find evidence for this claim, by doing more ANOVA tests. We did an ANOVA analysis of amount of games played since they got their highscores and the amount of times they broke their highscores, since these metrics could support our claim. We found out that the amount of games since they got their highscore had a significant difference, but in the sense that the speedup version has more games.
The amount of times they broke their high scores on the other hand, had no significant difference. The aforementioned could be explained by the fact that the speedup version is indeed more fun and encourages people to play it more.
To research whether or not luck vs. skill is indeed a factor in the engagement of the players, another study could be done in which a more skill-based difficulty increase is compared with the current linear increase in difficulty in the speedup version.
After doing the ANOVA analysis, we can not (yet) reject the null hypothesis. So we can not say that there is a difference in total time played per player for the difficulties, but we see more retention for the more difficult version.
We also noticed that the speedup version could be improved by changing the way it speeds up (linearly vs. more skill-based). This is grounds for another between-groups into the wild study, after we have given the current study more time so that the retention can exert its influence.