Abstract
Active and semi-supervised learning are important techniques when
labeled data are scarce. Recently a method was suggested for combining
active learning with a semi-supervised learning algorithm that uses
Gaussian fields and harmonic functions. This classifier is relational
in nature: it relies on having the data presented as a partially
labeled graph (also known as a within-network learning problem). This
work showed yet again that empirical risk minimization (ERM) was the
best method to find the next instance to label and provided an
efficient way to compute ERM with the semisupervised classifier. The
computational problem with ERM is that it relies on computing the risk
for all possible instances. If we could limit the candidates that
should be investigated, then we can speed up active learning
considerably. In the case where the data is graphical in nature, we
can leverage the graph structure to rapidly identify instances that
are likely to be good candidates for labeling. This paper describes a
novel hybrid approach of using of community finding and social network
analytic centrality measures to identify good candidates for labeling
and then using ERM to find the best instance in this candidate set. We
show on real-world data that we can limit the ERM computations to a
fraction of instances with comparable performance.