The inequality constraints can be treated as equality constraints with the introduction of “slack” parameters. Let
\(\varvec{\xi }=\left( \xi _{1}^{+},\ldots ,\xi _{I}^{+},\xi _{1}^{-},\ldots ,\xi _{I}^{-}\right) ^T\) be a comfortable vector of slack parameters, and
\(\varvec{\eta }=\left( \varvec{\theta }^T,\varvec{\xi }^T\right) ^T\in \mathbb {R}^{1+p+q+2I}\). Denote
$$\begin{aligned}&F^+_{r}\left( \varvec{\eta }\right) \\&\ \ \equiv \frac{\int _{a_{r}< \varphi \left( {\textbf{x}}\right) \le b_{r}} \int _{{\textbf{z}}^{\prime }} P_{\varvec{\beta }}(Y=1|\varvec{X}={\textbf{x}},\varvec{Z}={\textbf{z}}')\delta \left( {\textbf{x}}\right) f_{\varvec{\tau }}\left( {\textbf{z}}^{\prime }\mid {\textbf{x}}\right) d{\textbf{z}}^\prime d{\textbf{x}}}{\int _{a_{r}<\varphi \left( {\textbf{x}}\right) \le b_{r}}\delta \left( {\textbf{x}}\right) d{\textbf{x}}}-(1+d_r) P^e_{r} +{\xi _{r}^+}^2, \end{aligned}$$
and
$$\begin{aligned}&F^-_{r}\left( \varvec{\eta }\right) \\&\ \ \equiv (1-d_r) P^e_{r} -\frac{\int _{a_{r}<\varphi \left( {\textbf{x}}\right) \le b_{r}} \int _{{\textbf{z}}^{\prime }} P_{\varvec{\beta }}(Y=1|\varvec{X}={\textbf{x}},\varvec{Z}={\textbf{z}}')\delta \left( {\textbf{x}}\right) f_{\varvec{\tau }}\left( {\textbf{z}}^{\prime }\mid {\textbf{x}}\right) d{\textbf{z}}^\prime d{\textbf{x}}}{\int _{a_{r}<\varphi \left( {\textbf{x}}\right) \le b_{r}}\delta \left( {\textbf{x}}\right) d{\textbf{x}}} +{\xi _{r}^-}^2, \end{aligned}$$
\(r=1,\ldots ,I\). We can replace inequality constraints (
4) with equality constraints (Luenberger et al.
1984; Boyd et al.
2004) and consider the equivalent optimization problem, minimizing
$$\begin{aligned} \begin{aligned}&l(\varvec{\beta },\varvec{\tau })=\sum _{i=1}^{N}\log \left[ \frac{\exp \left\{ Y_{i}\left( \beta _0+\varvec{\beta }_{{\textbf{x}}}^{T}\textbf{X}_{i}+\varvec{\beta }_{{\textbf{z}}}^{T}\textbf{Z}_{i}\right) \right\} }{1+\exp \left( \beta _0+\varvec{\beta }_{{\textbf{x}}}^{T}\textbf{X}_{i}+\varvec{\beta }_{{\textbf{z}}}^{T}\textbf{Z}_{i}\right) }f_{\varvec{\tau }}\left( \textbf{Z}_{i}\mid \textbf{X}_{i}\right) \right] \\&\text {subject~to}~~~F^+_{r}\left( \varvec{\eta }\right) =0;~~~F^-_{r}\left( \varvec{\eta }\right) =0,~~~ r=1,...,I. \end{aligned} \end{aligned}$$
(9)
Define
\(l\left( \varvec{\eta }\right) :=l\left( \varvec{\theta }\right)\), and
\(\varvec{F}\left( \varvec{\eta }\right) =\left( F_{1}^+\left( \varvec{\eta }\right) ,\ldots ,F_{I}^+\left( \varvec{\eta }\right) , F_{1}^-\left( \varvec{\eta }\right) ,\ldots ,F_{I}^-\left( \varvec{\eta }\right) \right) ^{T}\) be the equality constraints. The constrained maximization problem discussed above can be concisely and equivalently written as maximizing, with respect to
\(\varvec{\eta }\),
$$\begin{aligned} l\left( \varvec{\eta }\right) \ \text {subject to}\ \varvec{F} \left( \varvec{\eta }\right) =\varvec{0}. \end{aligned}$$
According to the existence and uniqueness of
\(\varvec{\theta }^{*}\), there exists a unique
\(\varvec{\eta }^{*} = \left( \varvec{\theta }^{*T},\varvec{\xi }^{*T}\right) ^{T}\) such that
\(\varvec{F} \left( \varvec{\eta }^{*}\right) =\varvec{0}\), and
$$\begin{aligned} E\left\{ N^{-1}l\left( \varvec{\eta }^{*}\right) \right\} =\max \limits _{\varvec{\eta }:\varvec{F}\left( \varvec{\eta }\right) =\varvec{0}}E\left\{ N^{-1}l\left( \varvec{\eta }\right) \right\} . \end{aligned}$$
The corresponding Lagrangian function is
$$\begin{aligned} l\left( \varvec{\eta }\right) +\varvec{F}\left( \varvec{\eta }\right) ^{T}\varvec{\lambda }, \end{aligned}$$
where
\(\varvec{\lambda }\) is the set of Langrage multipliers. Based on the KKT conditions, we obtain
\(\widehat{\varvec{\eta }}_{n}\) by solving the equation
$$\begin{aligned} \frac{\partial l\left( \widehat{\varvec{\eta }}\right) }{\partial \varvec{\eta }}+\left\{ \frac{\partial \varvec{F} \left( \widehat{\varvec{\eta }}\right) }{\partial \varvec{\eta }}\right\} ^{T}\varvec{\lambda }=\varvec{0}, \end{aligned}$$
(10)
where
\(\partial l\left( \widehat{\varvec{\eta }}\right) /\partial \varvec{\eta }\in \mathbb {R}^{\left| \varvec{\eta }\right| }\),
\(\partial \varvec{F}\left( \widehat{\varvec{\eta }}\right) ^{T}/ \partial \varvec{\eta }\in \mathbb {R}^{\left| \varvec{\eta }\right| \times \left| \varvec{F}\right| }\),
\(\left| \varvec{\eta }\right|\) is the length of
\(\varvec{\eta }\) and
\(\left| \varvec{F}\right|\) is the number of equality constraints. Indeed, (
10) implies that
\(\partial l\left( \hat{\varvec{\eta }}\right) /\partial \varvec{\eta }\) is in the column space of
\(\partial \varvec{F}\left( \hat{\varvec{\eta }}\right) ^{T}/\partial \varvec{\eta }\). Thus, as long as
\(\left| \varvec{\eta }\right| >\left| \varvec{F}\right|\), we have a differentiable function
\(\textbf{U}\left( \varvec{\eta }\right) :\mathbb {R}^{\left| \varvec{\eta }\right| }\rightarrow \mathbb {R}^{\left| \varvec{\eta }\right| \times \left( \left| \varvec{\eta }\right| -\left| \varvec{F}\right| \right) }\), such that
\(\textbf{U}^{T}\left( \varvec{\eta }\right) \partial \varvec{F}\left( \varvec{\eta }\right) ^{T}/\partial \varvec{\eta }=\varvec{0}\),
\(\textbf{U}^{T}\left( \varvec{\eta }\right) \textbf{U}\left( \varvec{\eta }\right) =\varvec{I}\) for any
\(\varvec{\eta }\). And we will automatically have
$$\begin{aligned} \textbf{U}\left( \hat{\varvec{\eta }}\right) ^T\frac{\partial l\left( \hat{\varvec{\eta }}\right) }{\partial \varvec{\eta }}=\varvec{0}. \end{aligned}$$
(11)
For sufficiently large
N, taking Taylor expansion of (
11) about
\(\hat{\varvec{\eta }}\) at
\(\varvec{\eta }^{*}\)gives us
$$\begin{aligned} \varvec{0}=&\textbf{U}^T\left( \hat{\varvec{\eta }}\right) \left[ \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}}+\frac{1}{N}\frac{\partial ^{2}l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}}\sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) + O_{p}\left\{ \sqrt{N}\left\Vert \hat{\varvec{\eta }} -\varvec{\eta }^{*}\right\Vert ^{2} \right\} \right] \nonumber \\ =&\textbf{U}^T \left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}} + \textbf{U}\left( \hat{\varvec{\eta }}\right) ^T \frac{1}{N}\frac{\partial ^{2}l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}}\sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) + o_{p}\left\{ \sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \right\} \nonumber \\
=&\textbf{U}^T\left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}}+\left\{ \textbf{U}^T\left( \varvec{\eta }^{*}\right) +\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) ^{T}\frac{\partial \textbf{U}^T\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }}\right\} \frac{1}{N} \frac{\partial ^{2}l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}}\sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \nonumber \\&+o_{p}\left\{ \sqrt{N}\left( \hat{\varvec{\eta }} -\varvec{\eta }^{*}\right) \right\} \nonumber \\ =&\textbf{U}^{T}\left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}}+\textbf{U}^{T}\left( \varvec{\eta }^{*}\right) \frac{1}{N}\frac{\partial ^{2}l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}}\sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) + o_{p}\left\{ \sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \right\} . \end{aligned}$$
(12)
Following the theories in Crowder (
1984); Stoica and CN (
1998); Moore et al. (
2008), let
\(\psi \left( t\right) :\mathbb {R}\rightarrow \mathbb {R}^{\left| \varvec{\eta }\right| }\) be a continuous differentiable map representing the feasible arc and
\(\psi \left( 0\right) =\varvec{\eta }^{*}\),
\(\psi \left( 1/N\right) =\hat{\varvec{\eta }}\) for any
N. Then we have
$$\begin{aligned} \hat{\varvec{\eta }}-\varvec{\eta }^{*}=\psi \left( 1/N\right) -\psi \left( 0\right) =\frac{1}{n}\left. \frac{d\psi \left( t\right) }{dt}\right| _{t=1/n^{\prime }}, \end{aligned}$$
for some
\(0<1/n^{\prime }<1/N\). Note that
\(\textbf{C}\left\{ \psi \left( t\right) \right\} =0\) for all
t. Therefore
\(\textbf{C}\left\{ \psi \left( 1/n^{\prime }\right) \right\} =0\) and
$$\begin{aligned} 0=\left. \frac{\partial \textbf{C}\left\{ \psi \left( t\right) \right\} }{\partial t}\right| _{t=1/n^{\prime }}=\left. \frac{\partial \textbf{C}\left\{ \psi \left( t\right) \right\} }{\partial \psi }\frac{d\psi \left( t\right) }{dt}\right| _{t=1/n^{\prime }}. \end{aligned}$$
This implies that
\(d\psi \left( 1/n^{\prime }\right) /dt\) isin the column space of
\(\textbf{U}\left( \varvec{\eta }^{\prime }\right)\), where
\(\varvec{\eta }^{\prime } = \psi \left( 1/n^{\prime }\right)\), i.e.,
\(d\psi \left( 1/n^{\prime }\right) /dt = \textbf{U}\left( \varvec{\eta }^{\prime }\right) \varvec{Q}_N\) for some
\(\varvec{Q}_N\in \mathbb {R}^{\left| \varvec{\eta }\right| -\left| \varvec{F}\right| }\). Hence
$$\begin{aligned} \hat{\varvec{\eta }}-\varvec{\eta }^{*}=\frac{1}{N} \textbf{U}\left( \varvec{\eta }^{\prime }\right) \varvec{Q}_N. \end{aligned}$$
(13)
Inserting (
13) into (
12), we have
$$\begin{aligned} \varvec{0}=&\textbf{U}^{T}\left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}}+\frac{1}{N}\textbf{U}^{T}\left( \varvec{\eta }^{*}\right) \frac{1}{N} \frac{\partial ^{2}l \left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}} \sqrt{N}\textbf{U}\left( \varvec{\eta }^{\prime }\right) \varvec{Q}_{N}\\&+o_{p}\left\{ \sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \right\} . \end{aligned}$$
Thus
$$\begin{aligned} \varvec{Q}_{N}=&\left\{ -\frac{1}{\sqrt{N}} \textbf{U}^{T} \left( \varvec{\eta }^{*}\right) \frac{\partial ^{2}\frac{1}{N}l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}}\textbf{U}\left( \varvec{\eta }^{\prime }\right) \right\} ^{-}\textbf{U}^{T}\left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}} \nonumber \\&+o_{p}\left\{ N\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \right\} , \end{aligned}$$
(14)
where
\(A^{-}\) denotes the Moore-Penrose inverse of matrix
A. Combining (
13) and (
14), we have
$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) =&\textbf{U}\left( \varvec{\eta }^{\prime }\right) \left\{ - \frac{1}{\sqrt{N }} \textbf{U}^{T}\left( \varvec{\eta }^{*}\right) \frac{\partial ^{2}\frac{1}{N} l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^{T}} \textbf{U}\left( \varvec{\eta }^{\prime }\right) \right\} ^{-}\textbf{U}^{T}\left( \hat{\varvec{\eta }}\right) \frac{1}{\sqrt{N}}\frac{\partial l\left( \varvec{\eta }^{*}\right) }{\partial \varvec{\eta }^{T}}\\&+o_{p}\left\{ \sqrt{N}\left( \hat{\varvec{\eta }}-\varvec{\eta }^{*}\right) \right\} . \end{aligned}$$
When
\(N\rightarrow \infty\), we have
\(\textbf{U}\left( \hat{\varvec{\eta }}\right) \rightarrow \textbf{U}\left( \varvec{\eta }^{*}\right)\) and
\(\textbf{U}\left( \varvec{\eta }^{\prime }\right) \rightarrow \textbf{U}\left\{ \psi \left( 0\right) \right\} =\textbf{U}\left( \varvec{\eta }^{*}\right)\) by consistency of
\(\hat{\varvec{\eta }}\) and continuity of
\(\textbf{U}\). Further,
\(N^{-1/2}\partial l\left( \varvec{\eta }^{*}\right) /\partial \varvec{\eta }\rightarrow N\left\{ \varvec{0},\tilde{\mathcal {I}}\left( \varvec{\eta }^{*}\right) \right\}\) in distribution by the central limit theorem, where
\(\tilde{\mathcal {I}}\left( \varvec{\eta }^{*}\right) \equiv E_{S}\left[ \left\{ \partial l_{1}\left( \varvec{\eta }^{*}\right) /\partial \varvec{\eta }\right\} ^{\otimes 2}\right]\), and we use
\(l_{1}\left( \varvec{\eta }^{*}\right)\) to denote the first summand in
\(l\left( \varvec{\eta }^{*}\right)\). Thus,
\(\sqrt{N}\left( \hat{\varvec{\eta }} -\varvec{\eta }^{*}\right)\) converges to a normal distribution with mean zero and variance
\({\tilde{V}}\left( \varvec{\eta }^*\right) \tilde{\mathcal {I}}\left( \varvec{\eta }^*\right) {\tilde{V}}\left( \varvec{\eta }^*\right) ^{T}\), where
$$\begin{aligned} {\tilde{V}}\left( \varvec{\eta }^*\right) =\textbf{U}\left( \varvec{\eta }^*\right) \left[ \textbf{U}^T\left( \varvec{\eta }^*\right) E\{\frac{\partial ^2 l\left( \varvec{\eta }^*\right) }{\partial \varvec{\eta }\partial \varvec{\eta }^T}\}\textbf{U}\left( \varvec{\eta }\right) \right] ^{-}\textbf{U}^T\left( \varvec{\eta }^*\right) . \end{aligned}$$
Therefore, we know that
\(\sqrt{N}\left( \hat{\varvec{\theta }}-\varvec{\theta }^{*}\right)\) converges to a normal distribution with mean zero and variance
\(A_{\left| \varvec{\theta }\right| }^T\left[ {\tilde{V}}\left( \varvec{\eta }^*\right) \tilde{\mathcal {I}}\left( \varvec{\eta }^*\right) {\tilde{V}}\left( \varvec{\eta }^*\right) ^{T}\right] A_{\left| \varvec{\theta }\right|
}\), where
$$\begin{aligned} A_{\left| \varvec{\theta }\right| }=\left[ \begin{array}{c} \varvec{I}_{\left| \varvec{\theta }\right| \times \left| \varvec{\theta }\right| } \\ \varvec{0}_{2I\times \left| \varvec{\theta }\right| } \end{array}\right] . \end{aligned}$$
Simple algebra calculation yields that
$$\begin{aligned} A_{\left| \varvec{\theta }\right| }^T\left\{ {\tilde{V}}\left( \varvec{\eta }^*\right) \tilde{\mathcal {I}}\left( \varvec{\eta }^*\right) {\tilde{V}}\left( \varvec{\eta }^*\right) ^{T}\right\} A_{\left| \varvec{\theta }\right| }=\textbf{V}\left( \varvec{\theta }^*\right) \mathcal {I}\left( \varvec{\theta }^*\right) \textbf{V}^{T}\left( \varvec{\theta }^*\right) . \end{aligned}$$
This completed the proof.
\(\square\)