To Which Family Does the Function Mc005-1.jpg Belong?
Introduction
The idea of creating machines which learn past themselves has been driving humans for decades now. For fulfilling that dream, unsupervised learning and clustering is the key. Unsupervised learning provides more flexibility, but is more challenging as well.
Clustering plays an important part to depict insights from unlabeled data. It classifies the data in similar groups which improves various business decisions by providing a meta understanding.
In this skill test, we tested our community on clustering techniques. A total of 1566 people registered in this skill test. If you missed taking the exam, here is your opportunity for you to find out how many questions you could have answered correctly.
If you are just getting started with Unsupervised Learning, here are some comprehensive resources to aid you in your journey:
- Machine Learning Certification Course for Beginners
-
The Most Comprehensive Guide to K-Means Clustering You'll Ever Need
- Certified AI & ML Blackbelt+ Program
Overall Results
Below is the distribution of scores, this will help you evaluate your performance:
You can access your performance hither. More than than 390 people participated in the skill examination and the highest score was 33. Here are a few statistics virtually the distribution.
Overall distribution
Mean Score: 15.11
Median Score: fifteen
Mode Score: xvi
Helpful Resources
An Introduction to Clustering and different methods of clustering
Getting your clustering right (Part I)
Getting your clustering right (Part II)
Questions & Answers
Q1. Picture Recommendation systems are an example of:
- Classification
- Clustering
- Reinforcement Learning
- Regression
Options:
B. A. two Only
C. 1 and 2
D. 1 and three
Eastward. 2 and 3
F. one, two and 3
H. 1, 2, 3 and four
Solution: (E)
Generally, movie recommendation systems cluster the users in a finite number of similar groups based on their previous activities and profile. Then, at a key level, people in the aforementioned cluster are made like recommendations.
In some scenarios, this tin can besides be approached equally a classification problem for assigning the near appropriate motion picture grade to the user of a specific group of users. Besides, a movie recommendation system can be viewed as a reinforcement learning problem where it learns by its previous recommendations and improves the future recommendations.
Q2. Sentiment Analysis is an example of:
- Regression
- Nomenclature
- Clustering
- Reinforcement Learning
Options:
A. ane Only
B. one and 2
C. 1 and 3
D. 1, 2 and 3
E. one, two and 4
F. 1, 2, 3 and iv
Solution: (E)
Sentiment analysis at the cardinal level is the job of classifying the sentiments represented in an image, text or speech into a gear up of divers sentiment classes like happy, sad, excited, positive, negative, etc. Information technology tin can also be viewed as a regression problem for assigning a sentiment score of say one to 10 for a corresponding image, text or oral communication.
Another mode of looking at sentiment analysis is to consider information technology using a reinforcement learning perspective where the algorithm constantly learns from the accurateness of past sentiment analysis performed to improve the future functioning.
Q3. Can decision trees be used for performing clustering?
A. True
B. Fake
Solution: (A)
Decision trees can likewise be used to for clusters in the data but clustering often generates natural clusters and is not dependent on any objective function.
Q4. Which of the following is the nigh appropriate strategy for data cleaning earlier performing clustering analysis, given less than desirable number of information points:
- Capping and flouring of variables
- Removal of outliers
Options:
A. 1 only
B. 2 but
C. 1 and 2
D. None of the higher up
Solution: (A)
Removal of outliers is not recommended if the data points are few in number. In this scenario, capping and flouring of variables is the well-nigh appropriate strategy.
Q5. What is the minimum no. of variables/ features required to perform clustering?
A. 0
B. 1
C. 2
D. 3
Solution: (B)
At least a single variable is required to perform clustering assay. Clustering analysis with a single variable can be visualized with the help of a histogram.
Q6. For ii runs of K-Mean clustering is it expected to get same clustering results?
A. Yeah
B. No
Solution: (B)
Yard-Means clustering algorithm instead converses on local minima which might also correspond to the global minima in some cases just not always. Therefore, it's advised to run the Chiliad-Means algorithm multiple times before drawing inferences most the clusters.
Notwithstanding, note that it's possible to receive aforementioned clustering results from G-means by setting the same seed value for each run. But that is done by but making the algorithm cull the fix of same random no. for each run.
Q7. Is it possible that Assignment of observations to clusters does non change between successive iterations in K-Means
A. Yep
B. No
C. Can't say
D. None of these
Solution: (A)
When the K-Ways algorithm has reached the local or global minima, it will non modify the assignment of data points to clusters for 2 successive iterations.
Q8. Which of the following can human action as possible termination conditions in K-Means?
- For a fixed number of iterations.
- Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
- Centroids do non change between successive iterations.
- Terminate when RSS falls below a threshold.
Options:
A. i, 3 and 4
B. 1, two and 3
C. ane, 2 and 4
D. All of the above
Solution: (D)
All four weather tin can be used as possible termination condition in K-Means clustering:
- This condition limits the runtime of the clustering algorithm, but in some cases the quality of the clustering will be poor because of an bereft number of iterations.
- Except for cases with a bad local minimum, this produces a practiced clustering, but runtimes may be unacceptably long.
- This too ensures that the algorithm has converged at the minima.
- End when RSS falls beneath a threshold. This criterion ensures that the clustering is of a desired quality after termination. Practically, it's a good do to combine it with a jump on the number of iterations to guarantee termination.
Q9. Which of the following clustering algorithms suffers from the problem of convergence at local optima?
- Grand- Means clustering algorithm
- Agglomerative clustering algorithm
- Expectation-Maximization clustering algorithm
- Various clustering algorithm
Options:
A. 1 just
B. two and 3
C. 2 and 4
D. 1 and three
Due east. 1,2 and 4
F. All of the above
Solution: (D)
Out of the options given, but K-Means clustering algorithm and EM clustering algorithm has the drawback of converging at local minima.
Q10. Which of the following algorithm is most sensitive to outliers?
A. K-means clustering algorithm
B. K-medians clustering algorithm
C. K-modes clustering algorithm
D. K-medoids clustering algorithm
Solution: (A)
Out of all the options, Grand-Means clustering algorithm is almost sensitive to outliers as it uses the mean of cluster data points to find the cluster heart.
Q11. After performing K-Ways Clustering analysis on a dataset, y'all observed the following dendrogram. Which of the post-obit determination can exist drawn from the dendrogram?
A. There were 28 data points in clustering assay
B. The best no. of clusters for the analyzed data points is 4
C. The proximity function used is Average-link clustering
D. The above dendrogram interpretation is not possible for Chiliad-Means clustering analysis
Solution: (D)
A dendrogram is not possible for Thou-Means clustering analysis. Nevertheless, i tin create a cluster gram based on K-Means clustering analysis.
Q12. How tin can Clustering (Unsupervised Learning) be used to meliorate the accuracy of Linear Regression model (Supervised Learning):
- Creating different models for dissimilar cluster groups.
- Creating an input feature for cluster ids every bit an ordinal variable.
- Creating an input feature for cluster centroids as a continuous variable.
- Creating an input feature for cluster size as a continuous variable.
Options:
A. ane just
B. one and 2
C. 1 and 4
D. 3 merely
Eastward. 2 and 4
F. All of the in a higher place
Solution: (F)
Creating an input feature for cluster ids as ordinal variable or creating an input feature for cluster centroids as a continuous variable might not convey whatsoever relevant data to the regression model for multidimensional data. But for clustering in a single dimension, all of the given methods are expected to convey meaningful information to the regression model. For example, to cluster people in two groups based on their hair length, storing clustering ID as ordinal variable and cluster centroids as continuous variables will convey meaningful information.
Q13. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset?
A. Proximity function used
B. of data points used
C. of variables used
D. B and c only
East. All of the in a higher place
Solution: (E)
Change in either of Proximity function, no. of data points or no. of variables will lead to different clustering results and hence different dendrograms.
Q14. In the figure below, if you draw a horizontal line on y-centrality for y=2. What will be the number of clusters formed?
A. 1
B. 2
C. iii
D. four
Solution: (B)
Since the number of vertical lines intersecting the ruby-red horizontal line at y=2 in the dendrogram are 2, therefore, 2 clusters volition be formed.
Q15. What is the virtually advisable no. of clusters for the data points represented by the following dendrogram:
A. 2
B. iv
C. 6
D. viii
Solution: (B)
The decision of the no. of clusters that can best describe unlike groups can be called by observing the dendrogram. The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cut by a horizontal line that can transverse the maximum distance vertically without intersecting a cluster.
In the to a higher place example, the best option of no. of clusters will be 4 as the red horizontal line in the dendrogram beneath covers maximum vertical altitude AB.
Q16. In which of the following cases volition K-Means clustering fail to give good results?
- Data points with outliers
- Data points with different densities
- Data points with round shapes
- Information points with not-convex shapes
Options:
A. ane and 2
B. two and 3
C. 2 and iv
D. 1, 2 and four
Eastward. 1, 2, 3 and 4
Solution: (D)
1000-Means clustering algorithm fails to give good results when the data contains outliers, the density spread of data points beyond the data space is different and the information points follow non-convex shapes.
Q17. Which of the following metrics, do we take for finding contrast between ii clusters in hierarchical clustering?
- Single-link
- Complete-link
- Average-link
Options:
A. 1 and ii
B. 1 and 3
C. 2 and 3
D. 1, 2 and 3
Solution: (D)
All of the three methods i.east. unmarried link, consummate link and average link can be used for finding dissimilarity betwixt two clusters in hierarchical clustering.
Q18. Which of the following are true?
- Clustering analysis is negatively affected by multicollinearity of features
- Clustering analysis is negatively affected by heteroscedasticity
Options:
A. 1 merely
B. 2 simply
C. 1 and ii
D. None of them
Solution: (A)
Clustering analysis is not negatively affected by heteroscedasticity but the results are negatively impacted by multicollinearity of features/ variables used in clustering as the correlated characteristic/ variable will carry extra weight on the distance calculation than desired.
Q19. Given, 6 points with the following attributes:
Which of the following clustering representations and dendrogram depicts the use of MIN or Unmarried link proximity role in hierarchical clustering:
A.
B.
C.
D.
Solution: (A)
For the unmarried link or MIN version of hierarchical clustering, the proximity of two clusters is defined to be the minimum of the distance between any ii points in the different clusters. For instance, from the table, we come across that the altitude between points 3 and 6 is 0.11, and that is the height at which they are joined into one cluster in the dendrogram. As another example, the altitude between clusters {iii, 6} and {two, 5} is given by dist({3, 6}, {ii, 5}) = min(dist(iii, 2), dist(six, 2), dist(3, v), dist(6, 5)) = min(0.1483, 0.2540, 0.2843, 0.3921) = 0.1483.
Q20 Given, half dozen points with the post-obit attributes:
Which of the following clustering representations and dendrogram depicts the utilize of MAX or Complete link proximity function in hierarchical clustering:
A.
B.
C.
D.
Solution: (B)
For the single link or MAX version of hierarchical clustering, the proximity of 2 clusters is divers to be the maximum of the distance between any two points in the different clusters. Similarly, hither points three and six are merged starting time. However, {3, 6} is merged with {4}, instead of {two, v}. This is because the dist({three, six}, {four}) = max(dist(3, 4), dist(vi, 4)) = max(0.1513, 0.2216) = 0.2216, which is smaller than dist({three, 6}, {two, 5}) = max(dist(3, 2), dist(6, two), dist(iii, 5), dist(six, v)) = max(0.1483, 0.2540, 0.2843, 0.3921) = 0.3921 and dist({3, 6}, {1}) = max(dist(3, 1), dist(six, 1)) = max(0.2218, 0.2347) = 0.2347.
Q21 Given, six points with the post-obit attributes:
Which of the following clustering representations and dendrogram depicts the use of Group average proximity function in hierarchical clustering:
A.
B.
C.
D.
Solution: (C)
For the group boilerplate version of hierarchical clustering, the proximity of two clusters is defined to be the average of the pairwise proximities between all pairs of points in the dissimilar clusters. This is an intermediate approach between MIN and MAX. This is expressed past the following equation:
Here, the distance between some clusters. dist({3, 6, iv}, {1}) = (0.2218 + 0.3688 + 0.2347)/(3 ∗ 1) = 0.2751. dist({2, 5}, {1}) = (0.2357 + 0.3421)/(2 ∗ 1) = 0.2889. dist({3, vi, 4}, {ii, 5}) = (0.1483 + 0.2843 + 0.2540 + 0.3921 + 0.2042 + 0.2932)/(6∗1) = 0.2637. Because dist({three, six, 4}, {2, 5}) is smaller than dist({3, 6, 4}, {i}) and dist({two, 5}, {1}), these ii clusters are merged at the 4th stage
Q22. Given, six points with the following attributes:
Which of the following clustering representations and dendrogram depicts the use of Ward's method proximity function in hierarchical clustering:
A.
B.
C.
D.
Solution: (D)
Ward method is a centroid method. Centroid method calculates the proximity between two clusters by calculating the distance betwixt the centroids of clusters. For Ward'south method, the proximity between two clusters is defined as the increase in the squared fault that results when two clusters are merged. The results of applying Ward's method to the sample data set of 6 points. The resulting clustering is somewhat different from those produced by MIN, MAX, and group average.
Q23. What should exist the best choice of no. of clusters based on the post-obit results:
A. 1
B. 2
C. 3
D. 4
Solution: (C)
The silhouette coefficient is a mensurate of how similar an object is to its own cluster compared to other clusters. Number of clusters for which silhouette coefficient is highest represents the best choice of the number of clusters.
Q24. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis?
A. Imputation with hateful
B. Nearest Neighbor consignment
C. Imputation with Expectation Maximization algorithm
D. All of the above
Solution: (C)
All of the mentioned techniques are valid for treating missing values before clustering assay but just imputation with EM algorithm is iterative in its functioning.
Q25. One thousand-Mean algorithm has some limitations. One of the limitation it has is, it makes hard assignments(A point either completely belongs to a cluster or not belongs at all) of points to clusters.
Note: Soft assignment can be consider as the probability of being assigned to each cluster: say K = 3 and for some point xn, p1 = 0.seven, p2 = 0.2, p3 = 0.1)
Which of the following algorithm(s) allows soft assignments?
- Gaussian mixture models
- Fuzzy Chiliad-means
Options:
A. i only
B. 2 just
C. 1 and 2
D. None of these
Solution: (C)
Both, Gaussian mixture models and Fuzzy K-means allows soft assignments.
Q26. Presume, you desire to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After get-go iteration clusters, C1, C2, C3 has following observations:
C1: {(two,2), (4,4), (6,half dozen)}
C2: {(0,iv), (iv,0)}
C3: {(five,five), (9,9)}
What will be the cluster centroids if you desire to proceed for second iteration?
A. C1: (4,iv), C2: (2,2), C3: (7,vii)
B. C1: (half dozen,six), C2: (4,4), C3: (9,9)
C. C1: (2,two), C2: (0,0), C3: (five,5)
D. None of these
Solution: (A)
Finding centroid for data points in cluster C1 = ((2+4+6)/iii, (two+4+6)/3) = (iv, 4)
Finding centroid for information points in cluster C2 = ((0+iv)/2, (4+0)/ii) = (ii, 2)
Finding centroid for data points in cluster C3 = ((v+nine)/2, (v+9)/2) = (seven, seven)
Hence, C1: (iv,4), C2: (two,two), C3: (7,7)
Q27. Assume, you want to cluster 7 observations into 3 clusters using Yard-Ways clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations:
C1: {(ii,2), (4,4), (6,vi)}
C2: {(0,four), (4,0)}
C3: {(five,five), (nine,nine)}
What will be the Manhattan altitude for observation (9, 9) from cluster centroid C1. In second iteration.
A. ten
B. v*sqrt(two)
C. xiii*sqrt(2)
D. None of these
Solution: (A)
Manhattan distance between centroid C1 i.e. (4, iv) and (9, 9) = (9-4) + (ix-4) = ten
Q28. If ii variables V1 and V2, are used for clustering. Which of the post-obit are true for K ways clustering with thou =3?
- If V1 and V2 has a correlation of one, the cluster centroids will be in a straight line
- If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line
Options:
A. 1 only
B. 2 only
C. ane and 2
D. None of the above
Solution: (A)
If the correlation between the variables V1 and V2 is 1, then all the data points will be in a straight line. Hence, all the three cluster centroids will form a directly line too.
Q29. Feature scaling is an important step before applying K-Mean algorithm. What is reason behind this?
A. In distance calculation it will give the same weights for all features
B. You always get the aforementioned clusters. If you use or don't use feature scaling
C. In Manhattan distance it is an of import stride but in Euclidian it is non
D. None of these
Solution; (A)
Feature scaling ensures that all the features go aforementioned weight in the clustering analysis. Consider a scenario of clustering people based on their weights (in KG) with range 55-110 and height (in inches) with range 5.6 to half dozen.four. In this case, the clusters produced without scaling can be very misleading as the range of weight is much higher than that of height. Therefore, its necessary to bring them to same scale so that they have equal weightage on the clustering result.
Q30. Which of the following method is used for finding optimal of cluster in K-Mean algorithm?
A. Elbow method
B. Manhattan method
C. Ecludian mehthod
D. All of the higher up
E. None of these
Solution: (A)
Out of the given options, only elbow method is used for finding the optimal number of clusters. The elbow method looks at the pct of variance explained equally a function of the number of clusters: One should choose a number of clusters so that calculation another cluster doesn't give much meliorate modeling of the data.
Q31. What is true about Thou-Mean Clustering?
- 1000-means is extremely sensitive to cluster heart initializations
- Bad initialization can lead to Poor convergence speed
- Bad initialization can atomic number 82 to bad overall clustering
Options:
A. ane and 3
B. ane and 2
C. 2 and 3
D. 1, 2 and 3
Solution: (D)
All 3 of the given statements are true. K-means is extremely sensitive to cluster heart initialization. As well, bad initialization tin lead to Poor convergence speed as well every bit bad overall clustering.
Q32. Which of the post-obit can exist applied to go good results for K-means algorithm corresponding to global minima?
- Try to run algorithm for different centroid initialization
- Adjust number of iterations
- Find out the optimal number of clusters
Options:
A. ii and iii
B. 1 and three
C. 1 and 2
D. All of above
Solution: (D)
All of these are standard practices that are used in lodge to obtain expert clustering results.
Q33. What should exist the all-time choice for number of clusters based on the following results:
A. 5
B. 6
C. 14
D. Greater than 14
Solution: (B)
Based on the above results, the best choice of number of clusters using elbow method is 6.
Q34. What should be the all-time choice for number of clusters based on the post-obit results:
A. 2
B. four
C. 6
D. 8
Solution: (C)
Mostly, a higher average silhouette coefficient indicates better clustering quality. In this plot, the optimal clustering number of grid cells in the study expanse should be two, at which the value of the average silhouette coefficient is highest. Withal, the SSE of this clustering solution (k = 2) is too large. At 1000 = 6, the SSE is much lower. In improver, the value of the average silhouette coefficient at g = 6 is too very high, which is but lower than g = 2. Thus, the best choice is one thousand = vi.
Q35. Which of the post-obit sequences is right for a Grand-Means algorithm using Forgy method of initialization?
- Specify the number of clusters
- Assign cluster centroids randomly
- Assign each data point to the nearest cluster centroid
- Re-assign each point to nearest cluster centroids
- Re-compute cluster centroids
Options:
A. 1, 2, 3, v, 4
B. 1, 3, 2, 4, five
C. two, ane, 3, four, v
D. None of these
Solution: (A)
The methods used for initialization in K means are Forgy and Random Partition. The Forgy method randomly chooses k observations from the data gear up and uses these as the initial means. The Random Division method kickoff randomly assigns a cluster to each observation and then proceeds to the update step, thus calculating the initial mean to be the centroid of the cluster'south randomly assigned points.
Q36. If you lot are using Multinomial mixture models with the expectation-maximization algorithm for clustering a set of data points into 2 clusters, which of the assumptions are important:
A. All the data points follow two Gaussian distribution
B. All the information points follow n Gaussian distribution (due north >ii)
C. All the data points follow ii multinomial distribution
D. All the data points follow n multinomial distribution (northward >2)
Solution: (C)
In EM algorithm for clustering its essential to choose the same no. of clusters to classify the data points into as the no. of different distributions they are expected to be generated from and also the distributions must be of the same blazon.
Q37. Which of the post-obit is/are not true about Centroid based Grand-Means clustering algorithm and Distribution based expectation-maximization clustering algorithm:
- Both starts with random initializations
- Both are iterative algorithms
- Both have stiff assumptions that the data points must fulfill
- Both are sensitive to outliers
- Expectation maximization algorithm is a special case of Thousand-Ways
- Both requires prior knowledge of the no. of desired clusters
- The results produced by both are non-reproducible.
Options:
A. 1 only
B. 5 only
C. 1 and three
D. 6 and 7
E. 4, six and vii
F. None of the higher up
Solution: (B)
All of the above statements are truthful except the 5th as instead K-Means is a special case of EM algorithm in which merely the centroids of the cluster distributions are calculated at each iteration.
Q38. Which of the following is/are not truthful about DBSCAN clustering algorithm:
- For information points to be in a cluster, they must be in a distance threshold to a core point
- Information technology has strong assumptions for the distribution of data points in dataspace
- It has essentially high time complexity of social club O(n3)
- It does non require prior knowledge of the no. of desired clusters
- Information technology is robust to outliers
Options:
A. i only
B. 2 just
C. 4 just
D. 2 and 3
E. i and 5
F. 1, 3 and v
Solution: (D)
- DBSCAN tin grade a cluster of any arbitrary shape and does non have potent assumptions for the distribution of data points in the dataspace.
- DBSCAN has a depression time complexity of order O(n log north) simply.
Q39. Which of the following are the high and low premises for the existence of F-Score?
A. [0,1]
B. (0,1)
C. [-1,one]
D. None of the above
Solution: (A)
The everyman and highest possible values of F score are 0 and 1 with 1 representing that every data bespeak is assigned to the correct cluster and 0 representing that the precession and/ or recall of the clustering analysis are both 0. In clustering assay, high value of F score is desired.
Q40. Following are the results observed for clustering 6000 data points into 3 clusters: A, B and C:
What is the F1-Score with respect to cluster B?
A. 3
B. 4
C. 5
D. 6
Solution: (D)
Here,
Truthful Positive, TP = 1200
True Negative, TN = 600 + 1600 = 2200
Imitation Positive, FP = m + 200 = 1200
Fake Negative, FN = 400 + 400 = 800
Therefore,
Precision = TP / (TP + FP) = 0.5
Think = TP / (TP + FN) = 0.half-dozen
Hence,
Fane = 2 * (Precision * Recall)/ (Precision + recall) = 0.54 ~ 0.five
End Notes
I hope you enjoyed taking the examination and institute the solutions helpful. The test focused on conceptual as well as practical knowledge of clustering fundamentals and its various techniques.
I tried to clear all your doubts through this article, but if we take missed out on something then permit us know in comments below. Also, If y'all have whatever suggestions or improvements you think we should brand in the next skilltest, yous can let us know by dropping your feedback in the comments department.
Learn, compete, hack and get hired!
Source: https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/
0 Response to "To Which Family Does the Function Mc005-1.jpg Belong?"
Post a Comment