Introduction to Data Mining - Badge
Name
Issuer
Università degli Studi di Milano-Bicocca.
Issued since 21 June 2017.
Description
This Badge is earned by learners participating in the course "Introduction to Data Mining" offered by EduOpenThe Badge is to all intents and purposes the course’s certificate of attendance.
Badge Criteria
This BADGE has been issued to the student who attended, on the EduOpen MOOC platform all the courses of the pathway titled "Introduction to Data Mining" teached by Prof. Fabio Stella of the Department of Informatics, Systems and Communication of the University of Milano-Bicocca.
Skills
The student attended a pathway that consists of the following three courses: "Data Mining: CLASSIFICATION", "Data Mining: CLUSTERING and ASSOCIATION" and "Text Mining". In the "Data Mining: CLASSIFICATION" course, the student watched methodology and hands-on video lectures about the following topics; data types, data exploration, missing data replacement and pre-processing, formulation and solution of binary and non-binary classification problems with and without cost matrix, classification performance measures and related estimation techniques, ROC, Lift and Cumulative gain curves, and features selection algorithms. Furthermore, the student watched methodology and hands-on video lectures on how to train, test and validate the following classification models; decision trees, logistic regression, feed-forward neural networks, support vector machines, naive Bayes, tree augmented naive Bayes and Bayesian classifiers. In the "Data Mining: CLUSTERING and ASSOCIATION", the student watched methodology and hands-on video lectures about the following topics; how to measure the proximity between attributes of different types, similarity and distance, formulation and solution of clustering problems when using different types of attributes, how to apply partitioning, hierarchical, density based, and graph based clustering methods. The student was watched methodology and hands-on video lectures on how to validate a clustering solution and how to select the “optimal number of clusters (whatever it means). Furthermore, the student watched methodology and hands-on video lectures explaining how to extract association rules from transaction data and how to sort them according to different relevance measures. In the "Text Mining" course, the student watched methodology and hands-on video lectures about the following topics; extraction, transformation and loading of natural language text from different sources (Web, RSS feeds, Tweeter, Facebook, Reddit, Youtube etc. ), preprocessing and quantitative representations of natural language text (binary, term frequency, term frequency inverse document frequency etc. ), automatic classification of natural language text (sentiment analysis), clustering and topic extraction (topic models) for auto-organizing natural language text, information extraction from natural language text to recognize named entities (person, organization, location etc. ) and to discover their relationships. The student used the KNIME open source software platform to perform practice sessions, and he/she developed and to uploaded, to the EduOpen MOOC platform, more than 30 KNIME workflows. All workflows have been manually checked by Prof. Fabio Stella. The owner of this BADGE has the following competences: - How to pre-process different data types. - How to formulate binary and non-binary classification problems. - How to develop classification models to solve binary and non-binary classification problems. - How to compare different classification models, with and without cost matrix, to select which is the "optimal classifier". - How to discover the relevant attributes/features to solve a classification problem. - How to develop a KNIME workflow to formulate and solve binary and non-binary classification problems. - How to measure similarity/distance between two records. - How to formulate a clustering problem. - How to develop clustering models from different approaches; partitioning, hierarchical, density and graph based. - How to validate a clustering model, how many clusters to use. - Find which extracted rules are relevant/interesting. - How to develop a KNIME workflow to formulate and solve a clustering problem. - How to develop a KNIME workflow to formulate and solve a rule extraction problem. - To extract, transform and load natural language text from different sources (Web, RSS feeds, Tweeter, Facebook, Reddit, Youtube etc. ). - To preprocess natural language text to obtain a quantitative representation. - To formulate and solve text categorization problems. - To formulate and solve text clustering problems. - To formulate and solve topic modeling problems. - To formulate and solve information extraction problems.
Tags
ComputerandDataSciences, Datamining, Textmining