Cost-bounded learning



Cost is often a concern for statistics or machine learning algorithms. In many applications, such as e-commerce or manufacturing systems etc, the deployment cost for data-driven algorithms may be expensive due to the need of a continuous supply of large data. For example, the collection, purchase, storage or maintenance of data all incur a cost. Such costs quickly shrink the profit margin of large scale data-driven applications. We propose an efficient algorithm that is able to generate a cost schedule such that, for any given budget, it would suggest variables to use for near optimal model performance while the total cost of data is under the budget.

This is a new line of research that extends the current scope of cost-sensitive learning.



Citation

[1] R. Ming, H. Xu, S. Gibbs, D. Yan and M. Shao. A deep neural network based approach to building budget-constrained models for big data analysis. The 17th International Conference on Data Science (ICDATA), Las Vegas, Nevada, USA, July 26-29, 2021.

[2] D. Yan, Z. Qin, S. Gu, H. Xu and M. Shao. Cost-sensitive selection of variables by ensemble of model sequences. Knowledge and Information Systems, Vol 63, 1069-1092, 2021.   arXiv:1901.00456   [Local copy]

[3] V. Nagaraju, D. Yan and L. Fiondella. A framework for selecting subset of metrics considering cost. 24th International Conference on Reliability and Quality in Design, 2018.