CORS / Optimization Days 

HEC Montréal, May 29-31, 2023

CORS-JOPT2023

HEC Montreal, 29 — 31 May 2023

Schedule Authors My Schedule

MLAOII ML Accelerated Optimization II

May 29, 2023 03:30 PM – 05:10 PM

Location: Société canadienne des postes (yellow)

Chaired by Yossiri Adulyasak

4 Presentations

  • 03:30 PM - 03:55 PM

    Context-aware Neural Branching & Diving Strategies for Optimizing Industrial Mining Complexes

    • Yassine Yaakoubi, presenter, COSMO (Stochastic Mine Planning Laboratory) - McGill University
    • Roussos Dimitrakopoulos, COSMO Stochastic Mine Planning Laboratory, Université McGill

    Industrial mining complexes pose significant optimization challenges due to their large-scale, nonlinear, and stochastic nature. Efficiently managing all decisions within the mining complex is vital for maximizing net present value and minimizing risks associated with unmet targets. Furthermore, addressing both supply and demand uncertainty is vital, enabling the development of robust mine plans that are adaptable to fluctuating market conditions and geological uncertainties.

    In this presentation, we introduce context-aware neural branching and diving strategies for decision-making under uncertainty. Our approach combines: a neural branching policy mapping variable features to their effect on the objective function value, dynamically fixing/unfixing extraction sequence variables; a diving policy navigating heuristic selection trees; and a hyper-heuristic optimizing mineral value/supply chain dynamics.

    The proposed solution methodology is applied for optimizing mining complexes under joint supply and demand uncertainty and compared to state-of-the-art solvers. Results on multiple large-scale mining complexes demonstrate up to a 40% increase in net present value and a reduction of up to three orders of magnitude in primal suboptimality and execution time.

    Finally, we discuss extending our approach to other industries' supply/value chains, emphasizing the methodology's implications for operations research and its potential to address complex stochastic optimization problems.

  • 03:55 PM - 04:20 PM

    Neural Approximate Dynamic Programming for Order Dispatching Delivery

    • Arash Dehghan, presenter, Toronto Metropolitan University
    • Merve Bodur, University of Toronto
    • Mucahit Cevik, Toronto Metropolitan University

    Order dispatching is an important practical problem as it relates to daily
    operations of various delivery companies.
    In our work, we consider a specific order dispatching problem inspired by the
    specifications of an ultra-fast delivery company, which aims to fulfil as many
    orders as possible throughout the operation horizon. A centralised warehouse
    must make decisions on which orders to assign to which couriers at every epoch,
    taking into account each courier's designated shift and workload. Our proposed
    NeurADP approach utilises batching of orders and queueing of future orders for
    each courier, which improves upon prior work in this area. The proposed approach
    has several advantages over existing methods. First, it allows for flexible and
    efficient allocation of orders to couriers, maximising the number of fulfilled
    orders. Second, it is not as restrictive as previous approximate dynamic
    programming (ADP) works in its setup and implementation. We conduct empirical
    analyses on real-life delivery data from several cities and show that the
    NeurADP model generates policies which outperform previous rule-based heuristics
    and deep reinforcement learning policies.

  • 04:20 PM - 04:45 PM

    Solving Multi-Echelon Inventory Problems using Deep Reinforcement Learning with Centralized Control

    • Qihua Zhong, presenter, HEC Montreal
    • Yossiri Adulyasak, HEC Montréal
    • Martin Cousineau, HEC
    • Raf Jans, HEC Montréal

    Multi-echelon inventory models aim to minimize the system-wide total cost in a multi-stage supply chain by applying a proper ordering policy to each stage. The optimal solution is known only when several strict assumptions regarding the cost structure are made. To solve scenarios where those assumptions are relaxed, we apply and compare three efficient deep reinforcement learning (DRL) algorithms, namely Deep Q-network, Advantage Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient, to efficiently determine the inventory policy. We consider a serial supply chain as in the beer game, a classic multi-echelon inventory problem, and extend the application of DRL to the centralized decision-making setting which is more complex due to significantly larger state and action space. The experiments show that in both decentralized and centralized settings, the DRL agents learned policies with significant cost savings compared to benchmark heuristics.

  • 04:45 PM - 05:10 PM

    Deep Reinforcement Learning for Risk Averse Decision Making Problems

    • Erick Delage, presenter, GERAD, HEC Montréal
    • Saeed Marzban, HEC Montreal
    • Jonathan Yumeng Li, University of Ottawa

    Motivated by the application of equal-risk pricing and hedging of a financial derivative, where two operationally meaningful hedging portfolio policies need to be found that minimize coherent risk measures, we propose in this paper a novel deep reinforcement learning algorithm for solving risk-averse dynamic decision making problems. Prior to our work, such hedging problems can either only be solved based on static risk measures, leading to time-inconsistent policies, or based on dynamic programming solution schemes that are impracticable in realistic settings. Our work extends for the first time the deep deterministic policy gradient algorithm, an off-policy actor-critic reinforcement learning (ACRL) algorithm, to solving dynamic problems formulated based on time-consistent dynamic expectile risk measures. Our numerical experiments confirm that the new ACRL algorithm produces high quality solutions to equal-risk pricing and hedging problems and that its hedging strategy outperforms the strategy produced using a static risk measure when the risk is evaluated at later points of time.

Back