Journées de l'optimisation 2019

HEC Montréal, 13-15 mai 2019

JOPT2019

HEC Montréal, 13 — 15 mai 2019

Horaire Auteurs Mon horaire

TB3 Clustering

14 mai 2019 10h30 – 12h10

Salle: Demers Beaulne

Présidée par John Chinneck

3 présentations

  • 10h30 - 10h55

    Application of maximum diameter and minimum split clustering

    • Marcus Poggi, prés., Pontifícia Universidade Católica do Rio de Janeiro
    • Helio Lopes, PUC-Rio
    • Sonia Fiol González, PUC-Rio
    • Toni Pacheco, PUC-Rio
    • Bruno Tassara, PUC-Rio

    Given objects and associated dissimilarity matrix find the minimum number of clusters with given maximum diameter and minimum split. We develop an algorithm with simple constructive heuristics, set partitioning formulation, 3-src cuts and column enumeration. The approach characterizes well large data sets’ subgroups with very individual descriptions, eventually proving optimality.

  • 10h55 - 11h20

    On semi-supervised ellipsoidal clustering

    • Daniel Gribel, prés., CIRRELT
    • Michel Gendreau, Polytechnique Montréal
    • Thibaut Vidal, Pontifical Catholic University of Rio de Janeiro

    We study a minimum sum-of-squares clustering problem in the presence of semi-supervised information via pairwise "must-link" and "cannot-link" constraints. Our clusters can be elliptical, and therefore are represented by their centers and co-variance matrices. We conduct computational experiments to measure the benefits of semi-supervised information.
    Keywords: Clustering, Semi-supervised learning, Pairwise constraints

  • 11h20 - 11h45

    Post-separation classifier feature reduction

    • John Chinneck, prés., Carleton University

    Classifier feature reduction tries to find the smallest set of features that allows acceptable separation of the data. This is time-consuming, e.g. wrapper methods require multiple solutions using different subsets of the features. We describe a new approach: (i) find a separating hyperplane by any method, using all features, then (ii) find a different hyperplane that provides the same separation while using fewer features. The novelty is in the second step, which is based on new heuristics for finding sparse solutions to linear programs.
    Finding a separating hyperplane can be cast as an instance of the Maximum Feasible Subset problem (maxFS): given an infeasible set of linear constraints, find the largest cardinality feasible subset. There are good heuristics for this NP-hard problem. To convert: transform each data point into a linear inequality in the feature weights, then apply a maxFS heuristic to find a solution that satisfies as many of the inequalities as possible (i.e. correctly classifies as many of the data points as possible).
    We adapt this formulation for use with any hyperplane placement method: (i) find the hyperplane, (ii) convert only the correctly classified points to inequalities, (iii) find a sparse solution (i.e. one in which few variables are nonzero) to this feasible system. Few nonzero variables is the same as few features. The sparse solution algorithm is itself another variant of a maxFS solution heuristic. Experimental results are given.

Retour