PhD Defense: Effective Neural Network Architecture for Few-shot Learning
Speaker: Tianyu Kang Committee Members: Prof. Wei Ding (Chair), Prof. Dan Simovici, Prof. Ping Chen, Prof. Tiago Cogumbreiro GDP: Prof. Dan Simovici
Date and Time: July 18th, 2024 (Thursday) at 11:00 AM ET
In Person Location: M-3-0732 Zoom Link: https://umassboston.zoom.us/j/99881426716 Passcode: 869106
ABSTRACT Despite the current technological era’s ease of data collection, obtaining sufficient data for neural network training remains challenging due to factors like subject rarity, high costs, time constraints, incomplete historical data, and limited data in early research phases. Few-shot learning addresses these issues by enabling effective learning from small datasets, reducing data collection costs, and increasing accessibility to advanced machine-learning techniques. It also enhances training efficiency, improves generalization, and facilitates rapid adaptation to new tasks, proving invaluable in dynamic environments.
Few-shot learning faces significant challenges due to data scarcity and the need for robust generalization. Limited training examples hinder effective learning and increase the risk of overfitting, leading to poor performance on new data. This dissertation explores various methodologies to overcome these challenges, aiming to improve model robustness and performance in real-world scenarios.
First, we address the data scarcity problem that necessitates a multifaceted approach, with the overwhelming number of parameters in neural networks emerging as a primary culprit. This surplus of parameters intensifies the scarcity issue, leading to suboptimal performance and hindering effective learning. We address this issue through a comprehensive set of methodologies, leveraging domain knowledge to incorporate prior information, employing specialized algorithms tailored to the dataset, integrating Conditional Restricted Boltzmann Machines (CRBMs) with Deep Neural Networks (DNNs) to enhance learning efficiency, and optimizing efficiency through modular sparsification techniques.
Secondly, it is also imperative to underscore the significance of generalization and robustness in addressing real-world challenges. Given the limited availability of labeled data typical in such scenarios, the ability of models to generalize effectively from sparse examples becomes pivotal. We endeavor to tackle this issue through a diverse array of methodologies. These encompass network-based regularization, prioritization of techniques leveraging domain knowledge, as well as the strategic utilization of hybrid models.
Finally, we will merge the previously outlined methods into a cohesive model to effectively tackle realworld problems. We will perform experiments to compare our approach with SOTA methods, thereby demonstrating its effective performance