This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.Expand

This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.Expand

Efficient algorithms for projecting a vector onto the l1-ball are described and variants of stochastic gradient projection methods augmented with these efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques.Expand

The two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1, l2, l22, and l∞ regularization, and is extended and given efficient implementations for very high-dimensional data with sparsity.Expand

Borders on information-theoretic quantities that influence estimation rates as a function of the amount of privacy preserved can be viewed as quantitative data-processing inequalities that allow for precise characterization of statistical rates under local privacy constraints.Expand

It is proved that unlabeled data bridges the complexity gap between standard and robust classification: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy.Expand

IEEE 51st IEEE Conference on Decision and Control…

28 April 2011

TLDR

This work shows n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations, known to be optimal for a distributed system with n nodes even in the absence of delays.Expand

This work provides a training procedure that augments model parameter updates with worst-case perturbations of training data and efficiently certify robustness for the population loss by considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball.Expand

This work presents a new method for regularized convex optimization that unifies previously known firstorder algorithms, such as the projected gradient method, mirror descent, and forwardbackward splitting, and derives specific instantiations of this method for commonly used regularization functions,such as l1, mixed norm, and trace-norm.Expand

It is established that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples.Expand