|
Learning over inherently distributed data
It is becoming increasingly often that the data used for learning and inference are from different computer nodes or owned by different organizations. This is esecially common in medical and e-commerce applications. To distinguish with existing work in distributed cmputing, we term such a setting as inherently distributed data (IDD). Learning over IDD is also known as federated learning in literature.
Our framework enables learning and inference over inherently distributed data as if using all the data but with small communication overhead and a negligible loss in accuracy. Under our framework, major computations are performed in parallel locally at where the data is stored. Data privacy may also be preserved. Our framework can be readily applied to a broad class of learning and inference algorithms, including spectral clustering, classification, regression etc.
|
Citation
[1] D. Yan, Y. Wang, J. Wang, G. Wu and H. Wang.
Fast communication-efficient spectral clustering over distributed data,
IEEE Transactions on Big Data, Vol 7(1), 158-168, 2021 (online since March 2019).
arXiv:1905.01596
[2] D. Yan and Y. Xu.
Learning over inherently distributed data, 2019.
arXiv:1907.13208