Author: Zizhun Guo
作者:
写于:
A large-scale study to examine the relationship between specific gaming experiences and teens’ civic activities and commitments, has been conducted by Pew Research Center in 2007. This dataset contains varied categories of features produced from thousands of teen respondents in the US, which help to reveal the level of fondness of teenagers across different video game genres.
There are 12 game genres. They are atomic, distinguishable and in binary values, share a similar status quo in the physical meaning, that by selecting and composing them, it would form different segments whereas each of them could represent a type of player group. A simple calculation of how many possible player groups: 2^12 = 4096. That said, most players are not favoring all genres, or at least most of them. Plus, we only have 1000+ respondents from the dataset.
Clustering for types of gamers based on game genres
To discover the posible respondents segmentation to find the game genres pattern. The author decided to exploit Aggolomerative Clustering ML algorithm to get the data insights.
Feature | Game Genre |
---|---|
‘k14a’ | ‘fighting games’ |
‘k14b’ | ‘puzzle games’ |
‘k14c’ | ‘action games’ |
‘k14d’ | ‘FPS games’ |
‘k14e’ | ‘strategy games’ |
‘k14f’ | ‘simulation games’ |
‘k14g’ | ‘sports games’ |
‘k14h’ | ‘RPG games’ |
‘k14i’ | ‘adventure games’ |
‘k14j’ | ‘racing games’ |
‘k14k’ | ‘rhythm games’ |
‘k14l’ | ‘survival horror games’ |
From Table 1, k14b (puzzle games) and k14j (racing game) have the highest mean scores for voting, which indicates most respondents prefer them. It can be also noticed k14l (horror games) has the lowest mean scores which imply it is not popular at all.
Fig 1 shows a binary distribution trend that some groups of game genres are popular like puzzle games, racing games, sports games, action games and adventure games, while at the same time, the other groups of games are less like to be favored of like survival horror games, RPG games, FPS games and fighting games.
To further testify the grouping assumption, from Fig 8 correlation matrix, k14d (FPS games) and k14a (fighting games) are slightly correlated. Other pairs like k14c (action games) and k14d (FPS games) are just similar to k14d and k14a. The correlation matrix shows an interesting discovery, FPS games, fighting games, and action games may share some common traits that they might come from a same cluster and belongs to some type of player characteristics.
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, fcluster
def plot_dendrogram(model, **kwargs):
counts = np.zeros(model.children_.shape[0])
n_samples = len(model.labels_)
for i, merge in enumerate(model.children_):
current_count = 0
for child_idx in merge:
if child_idx < n_samples:
current_count += 1 # leaf node
else:
current_count += counts[child_idx - n_samples]
counts[i] = current_count
linkage_matrix = np.column_stack([model.children_, model.distances_,
counts]).astype(float)
dendrogram(linkage_matrix, **kwargs)
def clustering(df, linkage_matrix, n = 3):
labels_ = fcluster(linkage_matrix, t=n, criterion='maxclust')
n_labels = len(np.unique(labels_))
clusters = []
means = []
index_columns = []
label_counts = []
for i in range(1, n_labels + 1):
clusters.append(df[labels_ == i])
means.append(np.mean(df[labels_ == i]))
index_columns.append('cluster{}'.format(i))
label_counts.append(len(df[labels_ == i]))
clusters = np.array(clusters)
mean_matrix = np.around(means, 2)
return clusters, index_columns, mean_matrix, label_counts
model = AgglomerativeClustering(distance_threshold=0, n_clusters=None)
model = model.fit(df)
plt.title('Hierarchical Clustering Dendrogram')
# plot the top three levels of the dendrogram
linkage_matrix = plot_dendrogram(model, truncate_mode='level', p=7, color_threshold = 10)
plt.xlabel("Sample Index")
plt.ylabel("Distance(Ward)")
There are 7 clusters in Fig 3 the dendrogram. To make sure the clustering has 7 clusters, the distance threshold has a relatively wide range to select. If the number of clusters moves to 5, 4 or 3, the distances threshold get larger ranged which would potentially merge the distinctive clusters and lose diverse persona images for different types of respondents gamers.
To further check the game genre preferences for the players in 7 clusters, the mean scores array for each cluster is produced in order to make a heatmap, this would help find the unique pattern for different types of players.
Based on reference: https://www.pewresearch.org/internet/2008/09/16/teens-video-games-and-civics/, the respondents top 3 games from © are listed in the table below:
From Fig 5, the top 3 games by number of votes are Guitar Hero (rhythm games), Halo3 (FPS games) and Madden NFL (sports games)
Copyright @ 2021 Zizhun Guo. All Rights Reserved.