Loading [MathJax]/jax/output/HTML-CSS/jax.js

Wednesday, September 1, 2010

Constrained Cost-Sensitive: Regression Reduction Analysis

In a previous post I introduced two versions of constrained cost-sensitive multiclass classification (CSMC): average, with regret rav(h)=E(x,ω)Dx×Dω|x[EcDc|ω,x[c(h(x,ω))minkKEcDc|ω,x[c(k)]]], and minimax, with regret rmm(h)=ExDx[E˜cD˜c|x[maxωP(K){c(h(x,ω))minkKE˜cD˜c|x[c(k)]}]]. Now I'm going to state two results that are not that surprising.
Result:Average CSMC Regression Regret
For all average constrained CSMC distributions D, and all cost-sensitive classifiers hg derived from a regressor g, rav(hg)2|K|ϵL2(g), where ϵL2(g) is the L2 loss on the underlying regression problem.

Proof: See Average Analysis below.
Result:Minimax CSMC Regression Regret
For all minimax constrained CSMC distributions D, and all cost-sensitive classifiers hg derived from a regressor g, rmm(hg)2|K|ϵL2(g), where ϵL2(g) is the L2 loss on the underlying regression problem.

Proof: See Minimax Analysis below.
Those bounds should look familiar: they are the same as in the unconstrained CSMC case. Together these results indicate that a regression reduction is truly indifferent to constraints, even constraints that are adversarially imposed at application time. It will be interesting to see whether other reductions of unconstrained CSMC have different properties for constrained CSMC.

Average Analysis

In this case, there is a distribution D=Dx×Dω|x×Dc|ω,x, where c:KR takes values in the extended reals R=R{}, and the components of c which are -valued for a particular instance are revealed as part of the problem instance via ωP(K) (i.e., ω is a subset of K). The regret of a particular classifier h:X×P(K)K is given by rav(h)=E(x,ω)Dx×Dω|x[EcDc|ω,x[c(h(x,ω))minkKEcDc|ω,x[c(k)]]]. An argmax regression strategy to solve cost-sensitive multiclass classifier is a function g:X×P(K)×KR which defines an associated cost-sensitive multiclass classifier hg:X×P(K)K according to hg(x,ω)=argminkKg(x,ω,k). I would like to bound rav(hg) in terms of the regret of g on the regression problem, ϵL(g)=qL(g)mingqL(g), where q is the error on the regression problem qL(g)=E(x,ω,c)D[1|K|kKL(g(x,ω,k),c(k))], and L is a loss function for the regression problem (defined on the extended reals). I'll focus on L2 loss for the regressor defined on the extended reals via L2(,)=0, L2(,)=, and L2(,)=.

Consider a single instance (x,ω) with associated conditional per-instance cost-vector distribution Dc|ω,x, and suppose our regressor has cost estimates which differ from a minimum error regressor's estimate c(x,ω,k) by δ(x,ω,k), g(x,ω,k)=c(x,ω,k)+δ(x,ω,k). For kω, δ(x,ω,k)=0 since both c(x,ω,k) and our regressor g(x,ω,k) will be . The associated classifier hg is hg(x,ω)=argminkK(c(x,ω,k)+δ(x,ω,k)). Imagine an adversary which attempts to create a certain amount of CSMC regret on this instance while minimizing the amount of regression regret on this instance. This adversary is faced with the following family of problems indexed by kKω: minδkKδ(x,ω,k)2s.t.kk,c(x,ω,k)+δ(x,ω,k)c(x,ω,k)+δ(x,ω,k). This is the same as the unconstrained CSMC reduction to regression but with k restricted to the set Kω. When |Kω|1, the CSMC regret is zero; otherwise the adversary's strategy is unchanged: perturb the k and k estimates and leave others alone. Thus leveraging the previous analysis yields rav(hg)2|K|ϵL2(g). It should also be noted that the regression regret will be at most the regression regret in the unconstrained case, since the additional information contained in ω allows the regressor to have a perfect estimate for some values of k.

Minimax Analysis

In this case, there is a distribution D=Dx×D˜c|x, where ˜c:KR takes values in the regular reals R. Then, an adversary comes in and manufactures a cost vector c in the extended reals R by setting some of the components to ; these choices are revealed via ω prior to a decision being elicited. In this case the regret of a particular classifier is given by rmm(h)=ExDx[E˜cD˜c|x[maxωP(K){c(h(x,ω))minkKE˜cD˜c|x[c(k)]}]]. Consider a single instance x with associated conditional per-instance cost-vector distribution Dc|x; in addition the adversary can pick ω to construct the complete problem instance (x,ω). As above the regressor has cost estimates which differ from a minimum error regressor's estimate c(x,ω,k) by δ(x,ω,k) and when kω the estimates are perfect, i.e., δ(x,ω,k)=0.

Imagine an adversary which attempts to create a certain amount of CSMC regret on this instance while minimizing the amount of regression regret on this instance. This adversary is faced with the following family of problems indexed by ω and kKω: minδkKδ(x,ω,k)2s.t.kk,c(x,ω,k)+δ(x,ω,k)c(x,ω,k)+δ(x,ω,k). Again when |Kω|1, the CSMC regret is zero; otherwise the adversary's strategy is the same for each ω and the leveraging the previous analysis yields rmm(hg)2|K|ϵL2(hg).

No comments:

Post a Comment