Loading [MathJax]/jax/output/HTML-CSS/jax.js

Monday, September 13, 2010

A Forfeit Filter-Offset Tree

They say the value of an internet company goes up as the number of adjectives in it's description goes down. I hope the same is not true in machine learning!

So the first thing I ran into when trying to reduce average constrained cost-sensitive best m with partial feedback to average constrained cost-sensitive multiclass classification with partial feedback (CSMC-PF) is that given the way I'm setting up the subproblems there is generally more than one element of the reward vector revealed per historical instance. What's needed is a mash-up of the forfeit filter tree and the forfeit offset tree, which would use the forfeit filter tree update when both inputs to an internal node have revealed values, and otherwise would fall back to the forfeit offset tree update when only one input to an internal node has been revealed.

The average constrained CSMC-PF setup is as follows. There is a distribution D=Dx×Dω|x×Dr|ω,x where r:A[0,1]{} takes values on the unit interval augmented with , and the components of r that are valued for a particular instance are revealed as part of the problem instance via ωP(A) (i.e., ω is a subset of A). The regret of a particular deterministic policy h:X×P(A)A is v(h)=E(x,ω)Dx×Dω|x[maxkAErDr|ω,x[r(k)]ErDr|ω,x[r(h(x,ω))]]. In the training data only partial feedback is available, but unlike the previous post I'll assume that potentially multiple elements of the reward vector are revealed. I'll assume that the historical policy is using a known conditional distribution over the power set of actions given an instance p(A|x,ω). I'll use the shorthand A to refer to realizations from P(A).
Algorithm:Forfeit Filter-Offset Tree Train
Data: Constrained CSMC-PF training data set S.
Input: Importance-weighted binary classification routine Learn.
Input: A binary tree T over the labels with internal nodes Λ(T).
Result: Trained classifiers {Ψn|nΛ(T)}.
  1. For each nΛ(T) from leaves to roots:
    1. Sn=.
    2. For each example (x,ω,A,{r(a)|aA},p(|x,ω))S:
      1. Let λ and ϕ be the two classes input to n (the predictions of the left and right subtrees on input (x,ω) respectively).
      2. If λω, predict ϕ for the purposes of constructing training input for parent node (``λ forfeits'');
      3. else if ϕω, predict λ for the purposes of constructing training input for parent node (``ϕ forfeits'');
      4. else if λA and ϕA
        1. SnSn{(x,1r(λ)>r(ϕ),|r(λ)r(ϕ)|)}.
      5. else if λA and ϕA:
        1. If r(λ)<12, SnSn{(x,0,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](12r(λ)))};
        2. else SnSn{(x,1,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](r(λ)12))}.
      6. else if λA and ϕA:
        1. If r(ϕ)<12, SnSn{(x,1,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](12r(ϕ)))};
        2. else SnSn{(x,0,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](r(ϕ)12))}.
    3. Let Ψn=Learn(Sn).
  2. Return {Ψn|nΛ(T)}.
Algorithm:Forfeit Filter-Offset Tree Test
Input: A binary tree T over the labels with internal nodes Λ(T).
Input: Trained classifiers {Ψn|nΛ(T)}.
Input: Instance realization (x,ω).
Result: Predicted label k.
  1. Let n be the root node.
  2. Repeat until n is a leaf node:
    1. If all the labels of the leaves in the left-subtree of n are in ω, traverse to the right child;
    2. else if all the labels of the leaves in the right-subtree of n are in ω, traverse to the left child;
    3. else if Ψn(x)=1, traverse to the left child;
    4. else (when Ψn(x)=0 and at least one label in each subtree is not in ω), traverse to the right child.
  3. Return leaf label k.

Motivating the Update

The key to leveraging the filter tree style regret bound proof strategy is to ensure that the expected importance weight difference at an internal node is equal to the policy regret with respect to the two inputs to that node. When both reward values are known, the filter tree update gets the job done directly by differencing the inputs. When only one reward value is known, the offset tree update produces the correct result by weighting according to the relative probabilities of observation. Imagining a combination of the two, the expected importance weight of the left input conditioned on (x,ω,r) and λω and ϕω is wλ|r=EAp[1λA1ϕA1r(λ)12αλ,¬ϕ(r(λ)12)+1λA1ϕA1r(ϕ)<12α¬λ,ϕ(12r(ϕ))+1λA1ϕA1r(λ)>r(ϕ)αλ,ϕ(r(λ)r(ϕ))]/EAp[1λA+1ϕA1λA1ϕA], where αλ,¬ϕ is a (to be determined) scaling factor for when only λ is observed and exceeds 12 or when only ϕ is observed and does not exceed 12; α¬λ,ϕ is for when only ϕ is observed and exceeds 12 or when only λ is observed and does not exceed 12; and αλ,ϕ is for when both λ and ϕ are observed. Inspection suggests αλ,¬ϕ=(1γ)EAp[1λA+1ϕA1λA1ϕA]EAp[1λA1ϕA],α¬λ,ϕ=(1γ)EAp[1λA+1ϕA1λA1ϕA]EAp[1λA1ϕA],αλ,ϕ=γEAp[1λA+1ϕA1λA1ϕA]EAp[1λA1ϕA], for any γ[0,1] which would lead to wλ|r=(1γ)(r(λ)12)++(1γ)(12r(ϕ))++γ(r(λ)r(ϕ))+,wϕ|r=(1γ)(r(ϕ)12)++(1γ)(12r(λ))++γ(r(ϕ)r(λ))+,wλ|rwϕ|r=r(λ)r(ϕ). What about γ? I don't have any theoretical reason for my particular choice, I just figured setting γ to the relative probability of the filter update would give the right limiting behaviour (i.e., exactly reproduce offset or filter tree given the corresponding p(A|x,ω)). That implies γ=EAp[1λA1ϕA]EAp[1λA+1ϕA1λA1ϕA], and αλ,¬ϕ=EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA],α¬λ,ϕ=EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA],αλ,ϕ=1. Another idea is to choose γ to minimize the expected importance weight, but I don't have any results along those lines.

Regret Analysis

The regret analysis for the forfeit filter-offset tree is almost identical to the regret analysis for the forfeit offset tree.

Let Ψ=(T,{Ψn|nΛ(T)}) denote a particular forfeit filter-offset tree (i.e., a choice of a binary tree and a particular set of node classifiers), and let hΨ denote the policy that results from the forfeit filter-offset tree. The regret analysis leverages an induced importance-weighted binary distribution D(Ψ) over triples (x,y,w) defined as follows:
  1. Draw (x,ω,r) from D.
  2. Draw n uniform over the internal nodes Λ(T) of the binary tree.
  3. Let x=(x,n).
  4. Let λ and ϕ be the two classes input to n (the predictions of the left and right subtrees on input x respectively).
  5. If λω, create importance-weighted binary example (x,0,0);
  6. else if ϕω, create importance-weighted binary example (x,1,0);
  7. else (when λω and ϕω):
    1. Draw A from p(A|x,ω).
    2. If λA and ϕA, reject sample;
    3. else if λA and ϕA, create importance-weighted binary example (x,1r(λ)>r(ϕ),|r(λ)r(ϕ)|);
    4. else if λA and ϕA:
      1. If r(λ)<12, create importance-weighted binary example (x,0,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](12r(λ)));
      2. else (when r(λ)12), create importance-weighted binary example (x,1,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](r(λ)12));
    5. else (when λA and ϕA):
      1. If r(ϕ)<12, create importance-weighted binary example (x,1,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](12r(ϕ)));
      2. else (when r(ϕ)12), create importance-weighted binary example (x,0,EAp[1λA1ϕA+1λA1ϕA]EAp[1λA1ϕA](r(ϕ)12)).
The induced distribution D(Ψ) depends upon the particular forfeit filter-offset tree, but for any fixed forfeit filter-offset tree is well defined. Now I'd like to relate the policy regret of hΨ to the importance-weighted binary regret of Ψ, q(Ψ)=E(x,y,w)D(Ψ)[w1yΨ(x)]=1|Λ(T)|nΛ(T)E(x,ω)Dx×Dω|x[qn(Ψ|x,ω)], where qn(Ψ|x,ω)={0if Γ(nλ)ω= or Γ(nϕ)ω=;0if Ψn(x)=1wλ>wϕ;|wλwϕ|otherwise, is the importance weighted regret at internal node n, Γ(n) refers to set of labels (leaves) in the subtree rooted at n, nλ refers to the left child of n, nϕ refers to the right child of n, wλ is the expected importance weight for the left child conditioned on (x,ω), and wϕ is the expected importance weight for the right child conditioned on (x,ω).

Algorithm:Forfeit Filter-Offset Tree Train
For all partially labelled CSMC distributions D; all historical policies p such that EAp[1aA|x,ω]>0 whenever aω; and all forfeit filter-offset trees Ψ, v(hΨ)(|A|1)q(Ψ) where q(Ψ) is the importance-weighted binary regret on the induced subproblem.
Proof: See Appendix.

For the offset tree setting, where only one reward component is revealed per instance, the (|A|1) dependence is tight. On the other hand, when all reward components are revealed per instance, there are reductions that have a regret independent of the number of actions. I suspect the lower bound from the offset tree paper can be generalized to be a function of the distribution of |A|. What this means in practice is that the forfeit filter-offset tree, unlike the forfeit offset tree, is not ``as good as it gets'' when more than 1 reward is being revealed per historical instance.

Ok now I'm ready to look at cost-sensitive best m with partial feedback.

Appendix

This is the proof of the regret bound.

Consider a fixed (x,ω). It is useful to talk about the conditional policy regret experienced at an internal node n, v(hΨ|x,ω,n)=maxkΓ(n)ErDr|ω,x[r(k)]ErDr|ω,x[r(hΨn(x,ω))]. where hΨn is the prediction at internal node n. When n is the root of the tree, v(hΨ|x,ω,n) is the forfeit offset tree policy regret conditional on (x,ω).

The proof strategy is to bound v(hΨ|x,ω,n)mΛ(n)qm(Ψ|x,ω) via induction. The base case is trivially satisfied for trees with only one leaf (no internal nodes) since it evaluates to 00. To show the recursion at a particular internal node n, let λ and ϕ be the predictions of the left subtree (nλ) and right subtree (nϕ).

Case 1: Γ(nλ)ω=. In this case λω and forfeits, so ϕ is chosen. There must be a maximizer in the right subtree, since all values in the left subtree are . Furthermore qm(Ψ|x,ω)=0 for both m=n and for mΛ(nλ) by definition. Therefore v(hΨ|x,ω,n)=maxkΓ(n)ErDr|ω,x[r(k)]ErDr|ω,x[r(ϕ)]=maxkΓ(nϕ)ErDr|ω,x[r(k)]ErDr|ω,x[r(ϕ)]=v(hΨ|x,ω,nϕ)mΛ(nϕ)qm(Ψ|x,ω)=mΛ(n)qm(Ψ|x,ω).
Case 2: Γ(nλ)ω and Γ(nϕ)ω=. In this case ϕω and λω, so ϕ forfeits and λ is chosen. There must be a maximizer in the left subtree, since all values in the right subtree are . Furthermore qm(Ψ|x,ω)=0 for both m=n and for mΛ(nϕ) by definition. Therefore v(hΨ|x,ω,n)=maxkΓ(n)ErDr|ω,x[r(k)]ErDr|ω,x[r(λ)]=maxkΓ(nλ)ErDr|ω,x[r(k)]ErDr|ω,x[r(λ)]=v(hΨ|x,ω,nλ)mΛ(nλ)qm(Ψ|x,ω)=mΛ(n)qm(Ψ|x,ω).
Case 3: Γ(nλ)ω and Γ(nϕ)ω. This is the ``normal'' offset tree case, where both λω and ϕω so no forfeiture happens. As shown above, the expected importance weights conditioned on (x,ω,r) and λω and ϕω satisfy |wλwϕ|=|ErDr|ω,x[wλ|rwϕ|r]|=|ErDr|ω,x[r(λ)r(ϕ)]|, i.e., the importance-weighted regret at an internal node is equal to the policy regret with respect to the two actions input to that node.

Assume without loss of generality that the classifier chooses ϕ. If the maximizer comes from the right subtree, then v(hΨ|x,ω,n)=maxkΓ(nϕ)ErDr|ω,x[r(k)]ErDr|ω,x[r(ϕ)]=v(hΨ|x,ω,nϕ)mΛ(nϕ)qm(Ψ|x,ω)mΛ(n)qm(Ψ|x,ω). If the maximizer comes from the left subtree, then v(hΨ|x,ω,n)=maxkΓ(nλ)ErDr|ω,x[r(k)]ErDr|ω,x[r(ϕ)]=ErDr|ω,x[r(λ)r(ϕ)]+v(hΨ|x,ω,nλ)=qn(Ψ|x,ω)+v(hΨ|x,ω,nλ)qn(Ψ|x,ω)+mΛ(nλ)qm(Ψ|x,ω)mΛ(n)qm(Ψ|x,ω). Terminating the induction at the root yields v(hΨ|x,ω)nΛ(T)qn(Ψ|x,ω)=|Λ(T)|q(Ψ|x,ω). Taking the expectation of both sides with respect to Dx×Dω|x and noting |Λ(T)|=(|A|1) completes the proof.

No comments:

Post a Comment