We deal with the problem of learning sets of rules from datasets, where the data instances are represented by graphs. We focus on three particular subjects. First of all, we present a new, statistical approach to rule learning. Doing so, we address two of the problems inherent in traditional rule learning: The computational hardness of finding rule sets with low training error and the need for capacity control to avoid overfitting. The chosen representation involves weights attached to rules. Instead of optimizing the error rate directly, we optimize for rule sets that have large margin and low variance. To avoid overfitting, we propose a model selection strategy that utilizes a novel concentration inequality. Second, generating a suitable set of rules for structured data is a non-trivial task due to the vast number of potential candidates. Ideally, one would like to pick a small, but informative set of structural features, each providing complementary information about the instances. We frame the search for a suitable feature set as a combinatorial optimization problem and propose a stochastic local search algorithm for solving it. Third, we take a kernel-based approach to inductive transfer on structured data, that is, we aim at finding a kernel that works well on a new structured dataset given a range of different, but related datasets. To find such a kernel, we apply convex optimization on two levels. On the base level, we propose an iterative procedure to generate kernels that generalize well on the known datasets. On the meta level, we combine those kernels in a minimization criterion to predict a suitable kernel for the new data.