ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ | ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥΔΩΝ | ΕΠΙΣΤΗΜΗ ΔΕΔΟΜΕΝΩΝ ΚΑΙ ΜΗΧΑΝΙΚΗ ΜΑΘΗΣΗ

Algorithmic Data Science

Description

The course studies algorithms for extracting, processing, and analyzing data from very large data sets, that is, data so large that it does not fit in main memory. Because of the emphasis on size, many of the examples involve the World Wide Web or data derived from the World Wide Web. The main goal is to study methods for data mining, processing, and analysis, in a systematic way, including proofs of correctness and computational complexity. The main topics covered are:

Frequent Itemsets Mining: mining association rules and sets of items that are frequently encountered in big data. A-priori algorithm, Use of hashing, PCY, SON, Toivonen algorithms; generation and evaluation of association rules.  FP-Growth algorithm. 

Clustering techniques: The curse of dimentionality, Hierarchical clustering, the K-means algorithm and the BFR algorithm. Clustering with representatives, CURE algorithm. Advantages and disadvantages of well known methods.

Link Analysis:Definition of Page Rank, Structure of the Web. Power Iteration method, Avoiding Dead Ends, Spider Traps and Taxation, Efficient Computation of PageRank, Efficient Approaches to PageRank Iteration. Random Teleports, Computing PageRank (Matrix Formulation, Rearranging the Equation, Sparse Matrix Formulation and Encoding, Block stripe Analysis). Topic Specific PageRank,  Measuring Proximity in Graphs, algorithms SimRank, WebSpam, Term Spam, Spam Farming, Link Spamming, TrustRank, Spam Mass, Hubs and Authorities (HITS Algorithm).

Κατακερματισμός (Hashing): Closed and open addressing. Universal hash families. Chaining. Load factor and runtime analysis of basic operations. Open addressing. Linear and quadratic exploration. Double hashing. Load factor and runtime analysis of basic operations. Perfect hashing. Cuckoo hashing.

Recommendation Systems: The long Tail, Utility Matrix, Content Based Recommendation Systems, Item Profile, User Profile. Collaborative Filtering, Finding Similar Users, Rating prediction, User-User, Item-Item. Hybrid Methods, Combining Global Baseline estimate and Collaborative Filtering.

Community Detection: Introduction to community detection. Edge Betweeness and Modularity. Girvan-Newman algorithm. Louvain algorithm. Community detection via graphs eigenvalue, spectral clustering.

Advertising on the web  On-line Algorithms, Greedy Algorithms, Competitive Ratio, Matching problem, Adwords problem, the Balance Algorithm.      

Semester
Spring Semester
Category
Obligatory
Lecture Hours
3 hours
Credits
5