Processing math: 100%

Sunday, August 22, 2010

Stochastic Shortest Path Reduction to Cost-Sensitive Multiclass: Towards Better Regret Bounds

The regret bound for the path grid reduction of stochastic shortest path (SSP) without recourse to cost-sensitive multiclass classification (CSMC) was useful for showing consistency of the reduction, i.e., that a minimum regret path choosing policy results from a minimum regret cost-sensitive multiclass classifier. However the bound felt too loose: in particular, the O(n2) scaling factor on the regret did not seem consistent with the direct reduction to regression, which had an n3/2 scaling factor. Intuitively, it feels like I should be able to reduce to regression via CSMC and up with the same bound, which suggests an O(n) scaling factor for the CSMC reduction.

So I've been trying to come up with something along those lines. I don't have anything I'm super happy with, but here are some thoughts. First, here's a theorem that says that the conditional SSP regret is bounded by the conditional cost-sensitive regret summed across subproblems encountered on the SSP regret minimizing path.
Theorem:Optimal Path Regret Bound
For fixed x, let p(x) denote the set of (k,v) pairs encountered on the SSP regret minimizing path of length n from source vertex s to target vertex t. Then for all SSP distributions and all cost-sensitive classifiers Ψ, rspath(Ψ)=ExDx[(k,v)p(x)qkv(Ψ|x)], with the definition v,qn1,v(Ψ|x)=qnv(Ψ|x)=0.

Proof: Let Υkv(x) be the set of (k,v) pairs encountered on the SSP regret minimizing path of length nk from source vertex v to target vertex t. Proof is inductive on the property rkv(Ψ|x)(k,v)Υkv(x)qkv(Ψ|x).

When k=n2, it is easily verified directly that rn2,v(Ψ|x)=qn2,v(Ψ|x).
For k[1,n2), rkv(Ψ|x)qkv(Ψ|x)+rk+1,w(Ψ|x)qkv(Ψ|x)+(k,v)Υk+1,w(x)qk,v(Ψ|x)=(k,v)Υkv(x)qk,v(Ψ|x), where the first step is from the next step bound lemma, the second step leverages the induction hypothesis, and the third step is the from definition of Υkv(x). Noting that Υ1s(x)=p(x) yields
rspath(Ψ|x)=r1s(Ψ|x)(k,v)p(x)qkv(Ψ|x).
Taking expectation with respect to Dx completes the proof.
This theorem is interesting to me for two reasons. First it involves a sum over n regrets, so it seems closer to a better bound; more about that further below. Second it indicates that really only the classifiers along the optimal path matter; the important ones change as x varies, and I don't know apriori which ones they are (that would be the solution to SSP I seek!), but nonetheless many of the subproblems being solved are irrelevant. Maybe there is a way to iteratively adjust the cost vectors to take this into account.

Returning to the question of a better bound, it's a short hop from the above theorem to the statement rspath(Ψ)(n2)ExDx[maxkvqkv(Ψ|x)], which has the right flavor. For instance, if I reduce the individual CSMC subproblems to regression, I'd pick up an extra 2n leading factor and a ϵkv dependence on the regression regret for the subproblem. That would give rspath(Ψ)(n2)ExDx[maxkvqkv(Ψ|x)](n2)2nExDx[maxkvϵkv(Ψ|x)](n2)2nExDx[maxkvϵkv] where maxafa=maxafa and Jensen's inequality combine in the last step. This somewhat looks like O(n3/2)ϵ that I get from a direct reduction to regression. However I'm not satisfied, because I don't have a good intuition about how the expectation of a single regression subproblem relates to an ``L style'' expectation over the maximum of a set of regression subproblems.

No comments:

Post a Comment