This function receives a property listing task, a given concept, and a threshold. It clusterizes the data according to the order of the listed properties. Given the mentioned properties of all users for a specific concept, the algorithm estimates a similarity among properties, based on the number of words mentioned between properties. For example, if the properties A and B are usually mentioned one after another, their similarity will be higher than the properties A and C which are usually not even mentioned together. The properties with low similarity to all other properties (below the user-defined threshold) are discarded from the plot.
clusterImage(data, distThreshold, concept = NULL)List with 2 elements: ggplot2 plot and data frame with cluster information
Data frame with 3 columns: ID, Concept and Property
Distance value. It assign properties to specific cluster if their similarity is greater than distThreshold
Text value. Clusters will only be generated with properties from this concept.
data_cpn = data.frame(CPN_27)
threshold = 0.061
concept = "Ability"
cluster_data = clusterImage(data_cpn, threshold, concept)
Run the code above in your browser using DataLab