The constrained CSBM definition is as follows. There is a distribution D = D_x \times D_{\omega|x} \times D_{r|\omega,x}, where r: A \to [0, 1] \cup \{ -\infty \} takes values in the unit interval augmented with -\infty, and the components of r which are -\infty-valued for a particular instance are revealed as part of the problem instance via \omega \in \mathcal{P} (A) (i.e., \omega is a subset of A). Allowed outputs in response to a problem instance are subsets of A of size m, denoted S_m = \{ S | S \subseteq A, |S| = m \}. The regret of a particular deterministic policy h: X \times \mathcal{P} (A) \to S_m is given by v (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in h (x, \omega)} r (a) \right] \right]. Note when |A \setminus \omega| < m, any strategy achieves zero regret (via -\infty reward); therefore the ``interesting'' parts of the problem space are when |A \setminus \omega| \geq m.
There are two plausible scenarios for CSBM with partial feedback.
- Only the total reward associated with the set of actions chosen is observed. There is actually a version of this problem at my current gig, since there is a page on the site whose elements are designed to act in concert to elicit a single response.
- The reward associated with each action chosen is observed. For instance advertisements are generally chosen in a set, but provide individualized feedback.
The reduction works as follows: first the highest reward choice is chosen, then its reward is adjusted to -\infty, and the process is repeated until a set of size m has been achieved. The individual steps are posed as constrained CSMC with partial feedback (CSMC-PF) problems, which is essentially CSBM with m = 1. The forfeit filter-offset tree was designed for constrained CSMC-PF, and in particular can be used as the \mbox{Learn} oracle below. The forfeit filter-offset tree has the property that it always achieves finite regret, i.e., it chooses a feasible class whenever possible. In this context, it means the subproblems will never create duplicates.
Algorithm:Partial Feedback Set Select Train
Input: Action labels A, (maximum) size of set to select m \leq |A| / 2.
Input: Constrained CSMC-PF classifier \mbox{Learn}.
Data: Training data set S.
Result: Trained classifiers \{\Psi_n | n \in [1, m] \}.
Input: Constrained CSMC-PF classifier \mbox{Learn}.
Data: Training data set S.
Result: Trained classifiers \{\Psi_n | n \in [1, m] \}.
- Define \gamma_0 (\cdot, \cdot) = \emptyset.
- For each n from 1 to m:
- S_n = \emptyset.
- For each example \bigl(x, \omega, \mathcal{A}, \{ r (a) | a \in \mathcal{A} \}, p (\cdot | x, \omega) \bigr) \in S such that |A \setminus \omega| \geq m:
- Let \gamma_{n-1} (x, \omega) be the predicted best set from the previous iteration.
- For each action a:
- If a \in \gamma_{n-1} (x, \omega), r (n, a) = -\infty;
- else r (n, a) = r (a).
- S_n \leftarrow S_n \cup \left\{\bigl( x, \omega \cup \gamma_{n-1} (x), \mathcal{A}, \{ r (n, a) | a \in \mathcal{A} \}, p (\cdot | x, \omega) \bigr) \right\}.
- Let \Psi_n = \mbox{Learn} (S_n).
- Let \gamma_n (x) = \Psi_n (x) \cup \gamma_{n-1} (x).
- Return \{ \Psi_n | n \in [1, m] \}.
Algorithm:Set Select Test
Data: Class labels A, number of positions to populate l \leq m \leq |A|/2.
Data: Instance feature realization (x, \omega).
Data: Trained classifiers \{\Psi_n | n \in [1, m] \}.
Result: Set h^\Psi: X \to S_l.
Data: Instance feature realization (x, \omega).
Data: Trained classifiers \{\Psi_n | n \in [1, m] \}.
Result: Set h^\Psi: X \to S_l.
- \gamma_0^\Psi (x, \omega) = \emptyset.
- For n from 1 to l:
- \gamma_n^\Psi (x, \omega) = \gamma_{n-1}^\Psi (x, \omega) \cup \Psi_n (x, \omega \cup \gamma_{n-1}^\Psi (x, \omega)).
- If |\gamma_l^\Psi (x, \omega)| = l, h^\Psi (x, \omega) = \gamma_l^\Psi (x, \omega);
- else set h^\Psi (x, \omega) to an arbitrary element of S_l.
My goal is to bound the average constrained CSBM regret v (h) = E_{(x, \omega) \sim D_x \times D_{\omega|x}} \left[ \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in h (x, \omega)} r (a) \right] \right] in terms of the average constrained CSMC regret on the induced subproblems. Once again I'll leverage a trick from the filter tree derivation and collapse the multiple subproblems into a single subproblem by defining an induced distribution. Let D be the distribution of average constrained CSBM instances (x, \omega, r). Define the induced distribution D^\prime (\Psi, l) where l \leq m of constrained CSMC-PF instances (x^\prime, \omega^\prime, \mathcal{A}, \{ r^\prime (a) | a \in \mathcal{A} \}, p^\prime (\cdot | x^\prime, \omega^\prime)).
- Draw (x, \omega, r) from D.
- Draw n uniform on [1, l].
- Let x^\prime = (x, n).
- Let \omega^\prime = \omega \cup \gamma_{n-1} (x, \omega).
- For each action a:
- If a \in \gamma_{n-1} (x, \omega), r^\prime (a) = -\infty;
- else r^\prime (a) = r (a).
- Let p^\prime (\cdot | x^\prime, \omega^\prime) = p (\cdot | x, \omega).
- Create constrained CSMC-PF example (x^\prime, \omega^\prime, \mathcal{A}, \{ r^\prime (a) | a \in \mathcal{A} \}, p^\prime (\cdot | x^\prime, \omega^\prime)).
Theorem:Regret Bound
For all average constrained CSBM distributions D, and all average constrained CSMC classifiers \Psi, v (h^\Psi) \leq l\, q (\Psi, l). Proof: See Appendix.
The remarks from the previous version of this reduction still apply.
The reduction still seems inefficient when comparing reduction to regression directly (\sqrt{m} \sqrt{|A|} \sqrt{\epsilon_{L^2}}) versus reduction to regression via CSMC (m \sqrt{|A|} \sqrt{\epsilon_{L^2}}). This suggests there is a way to reduce this problem which only leverages \sqrt{m} CSMC subproblems. One possible source of inefficiency: the reduction is retrieving the elements in order, whereas the objective function is indifferent to order.
The regret bound indicates the following property: once I have trained to select sets of size m, I can get a regret bound for selecting sets of size l for any l \leq m. This suggests a variant with m = |A| could be used to reduce minimax constrained CSMC-PF to average constrained CSMC-PF. I'll explore that in a future blog post.
Appendix
This is the proof of the regret bound.
If \Psi achieves infinite regret on the induced subproblem, the bound holds trivially. Thus consider a \Psi that achieves finite regret.
If |A \setminus \omega| < l, then v = 0 for any choice in S_l, and the bound conditionally holds trivially. Thus consider |A \setminus \omega| \geq l: since \Psi achieves finite regret no duplicates are generated from any sub-classifier and h^\Psi (x, \omega) = \gamma^\Psi_l (x, \omega).
Consider a fixed (x, \omega) with |A \setminus \omega| \geq l. It is convenient to talk about v (h | x, \omega, n) = \max_{s \in S_m}\; E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] - E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in \gamma^\Psi_n (x, \omega)} r (a) \right], the conditional regret on this instance at the n^\mathrm{th} step in Partial Feedback Set Select Test. Let s^* (x, \omega, n) = \underset{s \in S_m}{\operatorname{arg\,max\,}} E_{r \sim D_{r|\omega,x}} \left[ \sum_{a \in s} r (a) \right] be any maximizer of the first term (which is unique up to ties); note any s^* (x, \omega, n) will select n classes with respect to largest conditional expected reward.
Proof is by demonstrating the property v (h^\Psi | x, \omega, n) \leq \sum_{r=1}^n q_r (\Psi, l). The property holds with equality for n = 1. For n > 1 note \begin{aligned} v (h^\Psi | x, \omega, n) - v (h^\Psi | x, \omega, n - 1) &= \max_{a \in A \setminus s^* (x, \omega, n - 1)} E_{r \sim D_{r|\omega,x}} \left[ r (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ r \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\ &\leq \max_{a \in A \setminus \gamma^\Psi_{n-1} (x, \omega)} E_{r \sim D_{r|\omega,x}} \left[ r (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ r \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\\ &\leq \max_{a \in A \setminus \gamma^\Psi_{n-1} (x, \omega)} E_{r \sim D_{r|\omega,x}} \left[ \tilde r_n (a) \right] \\ &\quad - E_{r \sim D_{r|\omega,x}} \left[ \tilde r_n \left(\Psi_n \left(x, \omega \cup \gamma^\Psi_{n-1} (x, \omega) \right) \right) \right], \\ &= q_n (\Psi, l | x, \omega), \end{aligned} where the second inequality is due to the optimality of s^* (x, \omega, n - 1) and the third inequality is because \tilde r_n (a) \leq r (a) with equality if a \not \in \gamma^\Psi_{n-1} (x, \omega). Summing the telescoping series establishes v (h^\Psi | x, \omega) = v (h^\Psi | x, \omega, l) \leq \sum_{r=1}^l q_r (\Psi, l | x, \omega) = l\, q (\Psi, l | x, \omega). Taking the expectation with respect to D_{x, \omega} completes the proof.
No comments:
Post a Comment