The constrained CSBM definition is as follows. There is a distribution D=Dx×Dω|x×Dr|ω,x, where r:A→[0,1]∪{−∞} takes values in the unit interval augmented with −∞, and the components of r which are −∞-valued for a particular instance are revealed as part of the problem instance via ω∈P(A) (i.e., ω is a subset of A). Allowed outputs in response to a problem instance are subsets of A of size m, denoted Sm={S|S⊆A,|S|=m}. The regret of a particular deterministic policy h:X×P(A)→Sm is given by v(h)=E(x,ω)∼Dx×Dω|x[maxs∈SmEr∼Dr|ω,x[∑a∈sr(a)]−Er∼Dr|ω,x[∑a∈h(x,ω)r(a)]]. Note when |A∖ω|<m, any strategy achieves zero regret (via −∞ reward); therefore the ``interesting'' parts of the problem space are when |A∖ω|≥m.
There are two plausible scenarios for CSBM with partial feedback.
- Only the total reward associated with the set of actions chosen is observed. There is actually a version of this problem at my current gig, since there is a page on the site whose elements are designed to act in concert to elicit a single response.
- The reward associated with each action chosen is observed. For instance advertisements are generally chosen in a set, but provide individualized feedback.
The reduction works as follows: first the highest reward choice is chosen, then its reward is adjusted to −∞, and the process is repeated until a set of size m has been achieved. The individual steps are posed as constrained CSMC with partial feedback (CSMC-PF) problems, which is essentially CSBM with m=1. The forfeit filter-offset tree was designed for constrained CSMC-PF, and in particular can be used as the Learn oracle below. The forfeit filter-offset tree has the property that it always achieves finite regret, i.e., it chooses a feasible class whenever possible. In this context, it means the subproblems will never create duplicates.
Algorithm:Partial Feedback Set Select Train
Input: Action labels A, (maximum) size of set to select m≤|A|/2.
Input: Constrained CSMC-PF classifier Learn.
Data: Training data set S.
Result: Trained classifiers {Ψn|n∈[1,m]}.
Input: Constrained CSMC-PF classifier Learn.
Data: Training data set S.
Result: Trained classifiers {Ψn|n∈[1,m]}.
- Define γ0(⋅,⋅)=∅.
- For each n from 1 to m:
- Sn=∅.
- For each example (x,ω,A,{r(a)|a∈A},p(⋅|x,ω))∈S such that |A∖ω|≥m:
- Let γn−1(x,ω) be the predicted best set from the previous iteration.
- For each action a:
- If a∈γn−1(x,ω), r(n,a)=−∞;
- else r(n,a)=r(a).
- Sn←Sn∪{(x,ω∪γn−1(x),A,{r(n,a)|a∈A},p(⋅|x,ω))}.
- Let Ψn=Learn(Sn).
- Let γn(x)=Ψn(x)∪γn−1(x).
- Return {Ψn|n∈[1,m]}.
Algorithm:Set Select Test
Data: Class labels A, number of positions to populate l≤m≤|A|/2.
Data: Instance feature realization (x,ω).
Data: Trained classifiers {Ψn|n∈[1,m]}.
Result: Set hΨ:X→Sl.
Data: Instance feature realization (x,ω).
Data: Trained classifiers {Ψn|n∈[1,m]}.
Result: Set hΨ:X→Sl.
- γΨ0(x,ω)=∅.
- For n from 1 to l:
- γΨn(x,ω)=γΨn−1(x,ω)∪Ψn(x,ω∪γΨn−1(x,ω)).
- If |γΨl(x,ω)|=l, hΨ(x,ω)=γΨl(x,ω);
- else set hΨ(x,ω) to an arbitrary element of Sl.
My goal is to bound the average constrained CSBM regret v(h)=E(x,ω)∼Dx×Dω|x[maxs∈SmEr∼Dr|ω,x[∑a∈sr(a)]−Er∼Dr|ω,x[∑a∈h(x,ω)r(a)]] in terms of the average constrained CSMC regret on the induced subproblems. Once again I'll leverage a trick from the filter tree derivation and collapse the multiple subproblems into a single subproblem by defining an induced distribution. Let D be the distribution of average constrained CSBM instances (x,ω,r). Define the induced distribution D′(Ψ,l) where l≤m of constrained CSMC-PF instances (x′,ω′,A,{r′(a)|a∈A},p′(⋅|x′,ω′)).
- Draw (x,ω,r) from D.
- Draw n uniform on [1,l].
- Let x′=(x,n).
- Let ω′=ω∪γn−1(x,ω).
- For each action a:
- If a∈γn−1(x,ω), r′(a)=−∞;
- else r′(a)=r(a).
- Let p′(⋅|x′,ω′)=p(⋅|x,ω).
- Create constrained CSMC-PF example (x′,ω′,A,{r′(a)|a∈A},p′(⋅|x′,ω′)).
Theorem:Regret Bound
For all average constrained CSBM distributions D, and all average constrained CSMC classifiers Ψ, v(hΨ)≤lq(Ψ,l). Proof: See Appendix.
The remarks from the previous version of this reduction still apply.
The reduction still seems inefficient when comparing reduction to regression directly (√m√|A|√ϵL2) versus reduction to regression via CSMC (m√|A|√ϵL2). This suggests there is a way to reduce this problem which only leverages √m CSMC subproblems. One possible source of inefficiency: the reduction is retrieving the elements in order, whereas the objective function is indifferent to order.
The regret bound indicates the following property: once I have trained to select sets of size m, I can get a regret bound for selecting sets of size l for any l≤m. This suggests a variant with m=|A| could be used to reduce minimax constrained CSMC-PF to average constrained CSMC-PF. I'll explore that in a future blog post.
Appendix
This is the proof of the regret bound.
If Ψ achieves infinite regret on the induced subproblem, the bound holds trivially. Thus consider a Ψ that achieves finite regret.
If |A∖ω|<l, then v=0 for any choice in Sl, and the bound conditionally holds trivially. Thus consider |A∖ω|≥l: since Ψ achieves finite regret no duplicates are generated from any sub-classifier and hΨ(x,ω)=γΨl(x,ω).
Consider a fixed (x,ω) with |A∖ω|≥l. It is convenient to talk about v(h|x,ω,n)=maxs∈SmEr∼Dr|ω,x[∑a∈sr(a)]−Er∼Dr|ω,x[∑a∈γΨn(x,ω)r(a)], the conditional regret on this instance at the nth step in Partial Feedback Set Select Test. Let s∗(x,ω,n)=argmaxs∈SmEr∼Dr|ω,x[∑a∈sr(a)] be any maximizer of the first term (which is unique up to ties); note any s∗(x,ω,n) will select n classes with respect to largest conditional expected reward.
Proof is by demonstrating the property v(hΨ|x,ω,n)≤∑nr=1qr(Ψ,l). The property holds with equality for n=1. For n>1 note v(hΨ|x,ω,n)−v(hΨ|x,ω,n−1)=maxa∈A∖s∗(x,ω,n−1)Er∼Dr|ω,x[r(a)]−Er∼Dr|ω,x[r(Ψn(x,ω∪γΨn−1(x,ω)))],≤maxa∈A∖γΨn−1(x,ω)Er∼Dr|ω,x[r(a)]−Er∼Dr|ω,x[r(Ψn(x,ω∪γΨn−1(x,ω)))], ≤maxa∈A∖γΨn−1(x,ω)Er∼Dr|ω,x[˜rn(a)]−Er∼Dr|ω,x[˜rn(Ψn(x,ω∪γΨn−1(x,ω)))],=qn(Ψ,l|x,ω), where the second inequality is due to the optimality of s∗(x,ω,n−1) and the third inequality is because ˜rn(a)≤r(a) with equality if a∉γΨn−1(x,ω). Summing the telescoping series establishes v(hΨ|x,ω)=v(hΨ|x,ω,l)≤l∑r=1qr(Ψ,l|x,ω)=lq(Ψ,l|x,ω). Taking the expectation with respect to Dx,ω completes the proof.
No comments:
Post a Comment