2.1 Decision aiding
Before we pass to Multiple Criteria Decision Aiding (MCDA), we should stress the Decision Aiding (DA) aspect of the methodology. The aim of DA is to analyze the decision-making context by identifying the problem with its actors, possible alternatives and their consequences, as well as the stakes (Roy
2000). Among the actors, there are stakeholders (actors concerned by the decision), single or multiple decision makers (users), and an analyst (facilitator). For DA it is important how the decision-making process will unfold, increasing consistency between the values underlying the objectives, and the quality of recommendations based on results from models and computational procedures designed with respect to some working hypotheses. Decision problems considered within DA involve a set
\({\mathcal {A}}\) of alternatives (actions, solutions, objects, etc.) evaluated from multiple points of view. The following three main problem statements are considered in DA:
-
Choice of the best alternative, with multiobjective optimization as a particular case.
-
Classification (called also sorting) of alternatives into pre-defined and preference-ordered classes.
-
Ranking of alternatives from the best to the worst.
The above decision problems involve, in general, multiple evaluation criteria (multiple objectives). A decision-maker (DM) seldom has a single clear criterion in mind. Seldom is there a common unit for all scales of criteria, which are rather heterogeneous. That is why it may be very difficult to define
a priori a unique criterion able to take into account all relevant points of view. Formally, the
criterion is a real-valued function
\(g_i\) defined on
\({\mathcal {A}}\), reflecting a value of each alternative from a particular point of view, such that in order to compare any two alternatives
\(a,b \in \) \({\mathcal {A}}\) from this point of view, it is sufficient to compare two values:
\(g_i(a)\) and
\(g_i(b)\), called
performances. It is worth stressing that criteria may have
ordinal evaluation scales, where a difference of performance does not have the meaning of intensity, or
cardinal evaluation scales, where a difference of performance has the meaning of intensity, of either interval or ratio type. The type of criterion scale determines the allowed arithmetic operations on the performances. For example, additions of (weighted) performances on ordinal scales have no meaning. In the rest of this section, we assume by default that criteria are of the
gain-type, i.e., the greater the performance, the better, unless explicitly specified that they are of the
cost-type.
Building a family of criteria
\({\mathcal {G}}=\{g_1,\dots ,g_n\}\) for a decision problem at hand is the preliminary step of MCDA. According to Roy (
1996), the family
\({\mathcal {G}}\) should be exhaustive, monotonic, and non-redundant. With the exception of trivial cases, the evaluation criteria are in conflict in the sense that, improving an alternative on one criterion causes deterioration on other criteria, thus, in set
\({\mathcal {A}}\), there is no alternative being the best on all criteria. The only objective information that stems from a multiple criteria evaluation of alternatives is the dominance relation in set
\({\mathcal {A}}\). The meaning of the dominance relation is the following: alternative
a dominates alternative
b if
a is not worse than
b on all criteria and on at least one criterion, it is strictly better. It is clear that such a binary relation is a weak partial order. As such, the dominance relation leaves many alternatives incomparable. Incomparability prevents the formulation of an unambiguous recommendation in the best choice, classification, or ranking. This is why the main preoccupation of MCDA consists of aggregating the multiple criteria into a
preference model making the alternatives more comparable in light of users’ preferences.
2.5 Elicitation of preference information
Preference elicitation can be either direct or indirect. In the case of direct elicitation, the DM is expected to provide parameters related to a supposed form of the preference model. Experience indicates that direct elicitation of numerical values of model parameters by DMs demands much of their cognitive effort. For example, defining weights
\(k_i, {i=1,\ldots ,n}\), of a weighted sum model requires specification of substitution rates
\(k_i/ k_j\) between all pairs of criteria
\(\{g_i,g_j\}\subset {\mathcal {G}}\). These substitution rates must be, moreover, constant in the whole range of criteria scales. In the case of heterogeneous criteria, it is not easy to say how much gain in performance on one criterion compensates for a loss of performance on another criterion. The same difficulty concerns the determination of relative importance weights of criteria having different semantics than the weights of the weighted sum. Other preference model parameters, like indifference, preference, and veto thresholds on particular criteria are not easier to elicit directly. A long time ago, Fishburn (
1967) listed and classified twenty-four methods of estimating additive utilities. Such a large number of methods for estimating parameters of the additive value function proves the difficulty of direct elicitation.
For this reason, indirect preference elicitation through holistic judgments, i.e.,
decision examples, gained importance in MCDA. Decision examples are relatively “easy” preference information. Decisions can also be observed without the active participation of DMs. For example, the choices made by people on the Internet provide rich information about their preferences. Finally, psychologists confirm that DMs are more confident exercising their decisions than explaining them (Slovic et al.
1977; March
1978).
Decision examples may either be provided by the DM on a set of real or hypothetical alternatives, or may come from observation of DM’s past decisions. Methods based on indirect preference information are considered more user-friendly than approaches based on direct elicitation of preferences. Moreover, the preference model constructed using decision examples aims to be
compatible with the preference information, i.e., to reproduce the holistic judgments observed in decision examples. For this, the DMs can verify if the preference model represents faithfully enough their preferences and what is the impact of the provided preference information on the recommendation produced by the method. This makes the whole decision-aiding process more transparent, increasing the DM’s confidence in the final recommendation. As noted by Stewart (
2005), MCDA includes a comprehensive process involving a rich interplay between human judgment, data analysis, and computational processes. Indeed, transparency and explainability of the decision-aiding process are necessary for an efficient interplay.
Decision examples elicited in the indirect approach may have different forms depending on the problem statement and the wishes of stakeholders. These may be: (i) pairwise comparisons of some real or fictitious alternatives, (ii) assignments of some alternatives to decision classes, as their typical examples, (iii) intensity of preference expressed on some quadruples of alternatives, e.g., a is preferred to z more than c is preferred to d, (iv) rank related preferences, e.g., alternative f should be among 5% of the best ones.
The MCDA methods based on the indirect preference information having the form of decision examples are called
ordinal regression methods. They were first proposed for the additive value function preference model in the method called UTA (Jacquet-Lagrèze and Siskos
1982). In UTA, the preference information consists of a set of pairwise comparisons of so-called reference alternatives
\(A^R = \{ a^{*}, b^{*}, \ldots \}\). Usually,
\(A^R \subseteq {\mathcal {A}}\), and these alternatives are relatively well-known to the DM, so that they are able to say for some
\(a^*,b^*\in A^R\), that
\(a^*\succeq b^*\) or
\(b^*\succeq a^*\) or
\(a^*\sim b^*\). A
compatible additive value function, reproducing as far as possible the pairwise comparisons of reference alternatives, is found as a result of solving a linear programming (LP) problem where variables are marginal values in the breakpoints of piecewise-linear marginal value functions. While the UTA method and its many variants gained a high popularity witnessed by many applications (Siskos et al.
2005), it suffers from the problem of non-uniqueness of LP solutions. In other words, the LP solution identifies a single compatible value function, while there may exist many instances of compatible value functions that reproduce the pairwise comparisons of reference alternatives equally well. Different compatible instances may, however, compare non-reference alternatives differently, making the recommendation ambiguous.
The issue of ambiguity has been resolved by Greco et al. (
2008) using the approach called Robust Ordinal Regression (ROR) (Greco et al.
2010d). In ROR, the preference data are the same as in UTA, however, the ordinal regression finds the whole set of compatible instances of the value function. The reconstruction of the DM’s judgments by the ordinal regression in terms of compatible instances of the value function is a preference learning step. As ROR works in a loop with incremental elicitation of preferences, it is called
constructive learning (Corrente et al.
2013a). In constructive learning the model aims to reconstruct as faithfully as possible the preference information of the DM, while the DM learns from the consequences of applying all compatible instances of the model on the whole set of alternatives. Constructive learning is ‘robust’ because it tells the truth about preferences based on partial, not exhaustive, preference information. For example, it does not require information about all possible pairwise comparisons, like the AHP method (Saaty
2005).
As ROR has the most in common with PL, we will characterize this approach in more detail in the next subsection.
2.7 ROR with value function preference model
Let us formulate the ordinal regression problem based on the above-mentioned preference information. In order to model the DM’s preference information, we are using in this subsection a general additive value function:
$$\begin{aligned} U(a)= \sum _{i=1}^{n} u_i(g_i(a)) \end{aligned}$$
(1)
where indices of criteria,
\(i=1,\ldots n\), form set
\({{\mathcal {J}}}\), the marginal value functions
\(u_i\),
\({i \in {\mathcal {J}}}\), are monotone, non-decreasing and normalized so that the additive value (
1) is bounded within the interval
\({[0, 1]}\). Note that for simplicity of notation, we write
\({u_i(a)}\),
\({i \in {\mathcal {J}}}\), instead of
\({u_{i}(g_i(a))}\). Consequently, the basic set of constraints defining general additive value functions compatible with the preference information composed of pairwise comparisons and comparisons of intensities of preferences, has the following form:
$$\begin{aligned} \left. \begin{array}{l} \;U(a^{*}) \ge U(b^{*}) + \varepsilon , \; \hbox { if } \; a^{*} \succ b^{*}, \hbox { for } \{a^{*}, b^{*}\} \in A^R\\ \;U(a^{*}) = U(b^{*}), \; \hbox { if } \; a^{*} \sim b^{*}, \hbox { for } \{a^{*}, b^{*}\} \in A^R\\ \;U(a^*) - U(b^*) \!\ge \! U(c^*) - U(d^*) \!+\! \varepsilon , \; \hbox { if } \; (a^*, b^*) \succ ^{*} (c^*, d^*), \hbox { for } \{a^{*}, b^{*}\, c^{*}, d^{*}\} \!\in \! A^R \\ \;U(a^*) - U(b^*) = U(c^*) - U(d^*), \; \hbox { if } \; (a^*, b^*) \sim ^{*} (c^*, d^*), \hbox { for } \{a^{*}, b^{*}\, c^{*}, d^{*}\} \in A^R \\ \;u_i(x_i^{h})-u_i(x_i^{(h-1)})\ge 0, \;h=2,\ldots , m_i(A^R),\\ \;u_i(x_i^{1})=0, \ \ \ \sum _{i=1}^n u_i(x_i^{m_i(A^R)})=1, \end{array} \right\} \; E^{A^R}_{BASE} \end{aligned}$$
where
\((a^*,b^*) \succ ^{*} (c^*, d^*)\) means
\(a^*\hbox { is preferred to } b^{*} \hbox { more than } c^{*} \hbox { is preferred to } d^{*}\),
\((a^{*},b^{*}) \sim ^{*} (c^*, d^*)\) means
\(a^*\hbox { is preferred to } b^{*} \hbox { as much as } c^*\hbox { is preferred to } d^{*}\),
\(m_i(A^R)\) is the number of different performances of reference alternatives on criterion
\(i \in {\mathcal {J}}\),
\(\ \ x_i^{1}, \ldots , x_i^{m_i(A^R)}\) are the ordered performances of reference alternatives on criterion
\(i \in {\mathcal {J}}\),
\(\ \ x_i^{h} < x_i^{h+1}\),
\(h=1, \ldots ,\) \(m_i(A^R)-1\). The first four constraints represent pairwise comparisons of reference alternatives, as well as comparisons between pairs of reference alternatives, and the last two are monotonicity and normalization constraints.
General monotonic marginal value functions defined in this way do not involve any arbitrary or restrictive parametrization (Greco et al.
2010d). On the contrary, the majority of existing methods employ marginal value functions that are linear or piecewise linear (Siskos et al.
2005). Note that piecewise linear functions require specification of the number of characteristic points which is not easy for most DMs.
In order to verify that the set of value functions
\({{\mathcal {U}}_{A^R}}\) compatible with preference information provided by the DM is not empty, the following LP problem has to be solved:
$$\begin{aligned} Maximize: \varepsilon , \hbox { s.t. } \ E^{A^R}_{BASE}. \end{aligned}$$
(2)
Let us denote by
\(\varepsilon ^{*}\) the maximal value of
\(\varepsilon \) obtained from the solution of the above LP problem, i.e.,
\(\varepsilon ^{*} =\) max \(\varepsilon \), s.t.
\(E^{A^R}_{BASE}\). It corresponds to a margin of the representation error. One concludes that
\({{\mathcal {U}}_{A^R}}\) is not empty, if
\(E^{A^R}_{BASE}\) is feasible and
\(\varepsilon ^{*} > 0\). In such a case, all pieces of preference information can be properly reproduced by at least one instance of the value function. On the contrary, when
\(E^{A^R}_{BASE}\) is infeasible or the margin of the representation error is not greater than zero, some pieces of DM’s preference on the set of reference alternatives
\(A^R\) are conflicting, and thus, some pairwise comparisons or comparisons between pairs cannot be reproduced by any additive value function. In the latter case, the DM can choose one of three options:
The consequence of considering many instances of the preference model is a univocal recommendation. In the constructive learning perspective, where the aim is not to predict, but rather to construct the preferences from scratch, the user has an interest in investigating what are the consequences of using all instances from
\({{\mathcal {U}}_{A^R}}\) on the set of alternatives
\({{\mathcal {A}}}\). Obviously, the final recommendation in terms of ranking, best choice, or classification may vary substantially depending on which compatible instance is selected. Remark also that ROR does not consider a probability distribution on the set of all compatible instances of the value function, assigning to all of them the same credibility.
When comparing a pair of alternatives
\((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\) in terms of the recommendation that is provided by any compatible (or non-perfectly compatible) value function from set
\({\ {\mathcal {U}}_{A^R}}\), it is reasonable to verify whether
a is weakly preferred to
b or vice versa, for all or at least one compatible instance of the value function. Answering these questions, ROR methods (Greco et al.
2008) produce two preference relations in the set of alternatives
\({\mathcal {A}}\):
-
a necessary weak preference relation \(\succsim ^N\) holds for a pair of alternatives \((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\), in case \(U(a) \ge U(b)\) for all compatible instances of the value function,
-
a possible weak preference relation \(\succsim ^P\) holds for a pair of alternatives \((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\), in case \(U(a) \ge U(b)\) for at least one compatible instance of the value function.
Thus defined, the necessary relations specify the most certain recommendation worked out on the basis of all compatible instances of the value function, while the possible relations identify a recommendation provided by at least one compatible instance of the value function. Consequently, the necessary outcomes can be considered robust regarding the available preference information, as they guarantee that a definite relation is the same whichever compatible instance of the preference model would be used. To verify the truth of the necessary and possible weak preference relations the following linear programs need to be solved (Corrente et al.
2013a):
$$\begin{aligned} {\textit{Maximize}: } \varepsilon \end{aligned}$$
s.t.
$$\begin{aligned} \left. \begin{array}{l} \; U(b) - U(a) \ge \varepsilon \\ \; E^{A^R}_{BASE} \end{array} \right\} E^N(a,b), \end{aligned}$$
and
$$\begin{aligned} {\textit{Maximize}: } \varepsilon \end{aligned}$$
s.t.
$$\begin{aligned} \left. \begin{array}{l} \; U(a) - U(b) \ge 0 \\ \; E^{A^R}_{BASE} \end{array} \right\} E^P(a,b). \end{aligned}$$
We conclude that
\(a \succsim ^N b\), if
\(E^N(a,b)\) is not feasible or
\(\varepsilon _*= max \ \varepsilon \), s.t.
\(E^N(a,b)\), is not greater than 0, and that
\(a \succsim ^P b\), if
\(E^P(a,b)\) if feasible and
\(\varepsilon ^{*} = max \ \varepsilon \), s.t.
\(E^P(a,b)\), is greater than 0.
Let us remark that preference relations \(\succsim ^N\) and \(\succsim ^P\) are meaningful only if there exists at least one compatible instance of the value function. Observe also that in this case, for any \(a^{*},b^{*} \in A^R\), \(a^{*} \succsim b^{*} \Rightarrow a^{*} \succsim ^{N} b^{*}\) and \(a^{*} \succ b^{*} \Rightarrow \; \lnot (b^{*} \succsim ^P a^{*}).\) In fact, if the DM says for \(a^{*},b^*\in A^R\) that \(a^*\succsim b^*\), then for any compatible instance of the value function \(U(a^*) \ge U(b^*)\), and, therefore, \(a^*\succsim ^N b^*\). Moreover, if \(a^*\succ b^*\), then for any compatible instance of the value function \(U(a^*) > U(b^*)\), and, consequently, there is no compatible instance of the value function such that \(U(b^*) \ge U(a^*)\), which means \(\lnot (b^*\succsim ^P a^*)\).
The necessary weak preference relation \(\succsim ^N\) is a partial preorder (i.e., it is reflexive (\(a \succsim ^N a\), since for all \(a \in {\mathcal {A}}, \ U(a) = U(a)\)), and transitive (for all \(a, b, c \in {\mathcal {A}}\), if \(a \succsim ^N b\) and \(b \succsim ^N c\), then \(a \succsim ^N c\)). Possible weak preference relation \(\succsim ^P\) is a strongly complete binary relation (i.e., for all \(a, b \in {\mathcal {A}}, \ a \succsim ^P b\) or \(b \succsim ^P a\)), and negatively transitive (i.e., \(\forall a, b, c \in {\mathcal {A}},\) if \(\lnot (a \succsim ^P b)\) and \(\lnot (b \succsim ^P c)\), then \(\lnot (a \succsim ^P c)\)).
Moreover, for any pair \((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\), \( a \succsim ^N b \Rightarrow a \succsim ^P b\), i.e., \( \succsim ^N \subseteq \succsim ^P\), and for all \((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\), \(a \succsim ^N b\) or \(b \succsim ^P a\).
ROR with a general additive value function preference model has also been applied to multiple criteria classification problems (Greco et al.
2010c; Köksalan and Ozpeynirci
2009). Within ROR for multiple criteria classification one can also handle additional preference information about the intensity of preference, in a way similar to (Figueira et al.
2009).
Since different compatible instances of the value function may order a given alternative from set
\({\mathcal {A}}\) at different positions in the ranking, or classify this alternative to different classes, Kadziński et al. (
2012) proposed to identify the extreme ranks, or classes, for each alternative, as well as the interval of scores it can attain. This procedure, based on mixed-integer LP, is known in ROR as Extreme Ranking Analysis.
Although richer than the dominance relation, the necessary preference relation
\(\succsim ^N\) may still leave some pairs of alternatives incomparable, i.e.,
\(a \succsim ^P b\) and
\(b \succsim ^P a\). To get an estimate of the probability that a randomly selected compatible instance of the value function ranks alternative
a higher than
b, or vice versa, one can perform the Stochastic Multiobjective Acceptability Analysis (SMAA) (Kadziński and Tervonen
2013). It consists of using a
Hit & Run sampling of compatible instances of the value function within constraints
\(E^{A^R}_{BASE}\), and checking the frequency with which:
-
\(a \succ b\), which gives pairwise winning index p(a, b),
-
a gets position h in the ranking, which gives rank acceptability index \(r^h_a\).
In SMAA, all compatible instances of the value function are equiprobable. In (Corrente et al.
2016a), it has been proposed to induce the probability distribution on compatible instances of the value function from additional
uncertain preference information of the type:
\(a \succ b\) is more credible than
\(b \succ a\). The procedure is composed of three steps:
-
sampling a number (\(s_v\)) of instances of the value function \(U_t \in {\mathcal {U}}_{A^R},\ t=1,\ldots ,s_v,\) compatible with the certain part of preference information provided by the DM,
-
inducing a probability distribution \(\left\{ w_t(U_t) \in [0, 1]: \sum _{t=1}^{s_v} w_t(U_t)=1 \right\} \) on the sample of instances generated in the previous step, taking into account the uncertain part of preference information via LP,
-
sampling probability distributions compatible with uncertain preference information, and taking a barycenter distribution as a representative one for the final ranking.
Yet another way of dealing with multiple compatible value functions in the UTA-like approach has been presented by Bous et al. (
2010) as ACUTA method. It is based on the computation of the analytic center of a polyhedron formed by compatible additive value functions for the selection of the representative value function.
2.9 ROR with outranking relation preference model
The value function model is a complete and transitive preference relation with a compensatory logic. In many real-life decision situations, when modeling users’ preferences it is reasonable to consider: (i) incomparability between alternatives, (ii) intransitive indifference and intransitive preference, and (iii) non-compensatory multiple criteria aggregation.
MCDA methods based on a preference model in the form of a valued binary relation called
outranking relation, such as ELECTRE and PROMETHEE, answer these needs (Greco et al.
2016a). Outranking relation
S groups three basic preference relations: indifference (
\(\sim \)), weak preference (
\(\succ ^w\)) and strong preference (
\(\sim ^s\)).
aSb reads: “alternative
a is at least as good as alternative
b”. As this relation is constructed for all ordered pairs of alternatives
\(\{a,b\} \in {\mathcal {A}} \times {\mathcal {A}}\), one can conclude the following relations between
a and
b:
-
\(aSb\ \wedge \ bSa \ \ \Leftrightarrow \ \ a \sim b\ \) (indifference),
-
\(aSb\ \wedge \ \lnot (bSa) \ \ \Leftrightarrow \ \ a \succ ^w b\ \vee \ a \succ ^s b\ \) (large preference),
-
\(\lnot (aSb)\ \wedge \ \lnot (bSa) \ \ \Leftrightarrow \ \ a \ ?\ b \ \ \) (incomparability).
S is an incomplete and intransitive relation on set
\({\mathcal {A}}\), constructed via concordance and discordance tests (Figueira et al.
2013). Outranking methods assume that evaluation criteria from family
\({\mathcal {G}}\) are
pseudo-criteria equipped with comparison thresholds. A pseudo-criterion is a real-valued function
\(g_i \in {\mathcal {G}}\) associated with two threshold functions,
\(q_i(\cdot )\) and
\(p_i(\cdot )\), whose arguments are performances of the alternatives. By convention, the thresholds are functions of the worst performance of the two alternatives being compared. They are called indifference and preference thresholds, respectively. The thresholds satisfy the following condition: for all ordered pairs of alternatives
\((a, b) \in {\mathcal {A}} \times {\mathcal {A}}\), such that
\(g_i(a) \ge g_i(b)\),
\(g_i(a) + p_i(g_j(b))\) and
\(g_i(a) + q_i(g_i(b))\) are non-decreasing monotone functions of
\(g_i(b)\), such that
\(p_i(g_i(b)) \ge q_i(g_i(b)) \ge 0\) for all
\(a, b \in {\mathcal {A}}\). In this definition, the thresholds are functions of the performance, but they can also be defined as constant values
\(p_i \ge q_i \ge 0, i \in {\mathcal {J}}\). Their meaning is the following: indifference threshold
\(q_i\) is the maximum difference of performances of the compared alternatives allowing them to be considered equivalent with respect to criterion
\(g_i\); preference threshold
\(p_i\) is the minimum difference of performances of the compared alternatives switching the DM’s preference with respect to criterion
\(g_i\) from weak to strict.
The concordance test checks if the coalition of criteria concordant with the hypothesis
aSb is strong enough. This strength is measured by the concordance index:
$$\begin{aligned} C(a,b)= \frac{\sum _{i=1}^n w_i \times C_i(a,b)}{\sum _{i=1}^n w_i}, \end{aligned}$$
where
\(w_i, i \in {\mathcal {J}}\), are weights of criteria which do not have the meaning of substitution rates, like weights
\(k_i, i \in {\mathcal {J}}\), of the weighted sum, because the former weights are not multiplied by performances on the corresponding criteria, but by 0–1 individual concordance coefficients
\(C_i(a,b)\) defined as:
$$\begin{aligned} C_i(a,b) = \left\{ \begin{array}{ll} 1 &{} \text {if}\, \; g_i(a) - g_i(b) \ge - q_i,\\ \frac{ g_i(a) - g_i(b) + p_i}{p_i \; - \; q_i} &{} \text {if}\, \; -p_i< g_i(a) - g_i(b) < - q_i,\\ 0 &{} \text {if}\, \; g_i(a) - g_i(b) \le - p_i. \end{array} \right. \end{aligned}$$
Thus, the aggregation is not compensatory and weights
\(w_i, i \in {\mathcal {J}}\), indicate a voting power of
\(g_i \in {\mathcal {G}}\). The concordance test is positive if
\(C(a,b) \ge \lambda \), where
\(\lambda \in [0.5, 1]\) is a cutting level (concordance threshold).
The discordance test checks if among criteria discordant with the hypothesis aSb there is a strong opposition against aSb. Criterion \(g_i \in {\mathcal {G}}\) opposes strongly to the assertion aSb if \(g_i(b) - g_i(a) \ge v_i\), where \(v_i \ge p_i\) is called the veto threshold of \(g_i\). In this situation, criterion \(g_i\) makes a veto to the assertion aSb.
In conclusion, outranking relation aSb is true if and only if \(C(a,b)\ge \lambda \) and there is no criterion strongly opposed (making veto) to the hypothesis. In this way, for each ordered pair \((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\) one obtains relation S being true (1) or false (0), presented as a directed outranking graph where the nodes correspond to alternatives and there is an arc from node a to node b if aSb. To choose the best alternative of \({\mathcal {A}}\), one can exploit the outranking relation by searching for a kernel of the outranking graph. When the graph has no cycles, its kernel is unique and includes alternatives that do not mutually outrank; each alternative from outside the kernel is outranked by at least one alternative from the kernel. Thus, the kernel is composed of the best incomparable alternatives.
The many parameters to be fixed by the DM in the outranking method can be avoided in the ROR approach. Then, the DM provides preference information in terms of holistic pairwise comparisons of some reference alternatives:
\(a^*S b^*\) or
\(a^*S^c b^*\) for some
\(a^*,b^*\in A^R \subseteq {\mathcal {A}}\), where
\(S^c\) is a negation of the outranking relation. Moreover, according to Greco et al. (
2011), where ROR was adapted to ELECTRE methods, the DM has a possibility to specify
\([q_{i*}, q_i^*]\) and
\([p_{i*}, p_i^*]\), being the ranges of indifference and preference thresholds, respectively, allowed by the DM.
Assuming
\(\sum _{i=1}^n w_i = 1\), one can substitute
\(C(a,b)= \sum _{i=1}^n w_i C_i(a,b) \ \) by
\( \ C(a,b)= \sum _{i=1}^n \Psi _i(a,b)\), where marginal concordance functions
\(\Psi _i(a,b), i \in {\mathcal {J}},\) are non-decreasing functions of
\(g_i(a)-g_i(b)\):
$$\begin{aligned} \Psi _i(a,b) = \left\{ \begin{array}{ll} \Psi _i(\beta _i, \alpha _i) &{} \text {if}\, \; g_i(a) - g_i(b) \ge - q_i,\\ \frac{ g_i(a) - g_i(b) + p_i}{p_i \; - \; q_i} &{} \text {if}\, \; -p_i< g_i(a) - g_i(b) < - q_i,\\ 0 &{} \text {if}\, \; g_i(a) - g_i(b) \le - p_i, \end{array} \right. \end{aligned}$$
and
\(\alpha _i,\beta _i\) are, respectively, the worst and the best-observed performances on criterion
\(g_i, i=1,\ldots , n\).
The outranking model compatible with preference information provided by the DM is a set of marginal concordance functions \(\Psi _i(a,b)\), cutting levels \(\lambda \), indifference \(q_i\), preference \(p_i\), and veto thresholds \(v_i, i=1,\ldots , n\), reproducing the DM’s preference information concerning pairs \((a^*,b^*) \in A^R \times A^R\).
In this case, the ordinal regression compatibility constraints
\(E^{A^R}\) are:
$$\begin{aligned} \left. \begin{array}{l} \;\hbox {If } \; a^*S b^*, \hbox { for } (a^*, b^*) \in A^R \times A^R \hbox {:}\\ \;\hbox {(i) }\; C(a^*,b^*) = \sum _{i=1}^n \Psi _i(a^*,b^*) \ge \lambda ,\\ \;\hbox {(ii) }\; g_i(b^*) - g_i(a^*) +\varepsilon \le v_i, \; i=1,\ldots ,n. \\ \;\hbox {If } \; a^*S^c b^*, \hbox { for } (a^*, b^*) \in A^R \times A^R \hbox {:}\\ \;\hbox {(iii) }\; C(a^*,b^*) = \sum _{i=1}^n \Psi _i(a^*,b^*) + \varepsilon \le \lambda +M_0(a^*,b^*), \\ \;\hbox {(iv) }\; g_i(b^*) - g_i(a^*) \ge v_i - \delta M_i(a^*, b^*), \; i=1,\ldots ,n, \\ \;\hbox {(v) }\; M_i(a^*, b^*) \in \{0, 1\}, \; i=0,1,\ldots ,n, \\ \;\hbox {(vi) }\; \sum _{i=0}^n M_i(a^*, b^*) \le n. \\ \;\hbox {(vii) }\; 0.5 \le \lambda \le 1, \\ \;\hbox {(viii) }\; v_i \ge p_i^*+ \varepsilon , \; i=1,\ldots ,n, \\ \;\hbox {(ix) }\; \varepsilon \ge 0. \\ \end{array} \right\} \; E^{A^R}_{BASE} \end{aligned}$$
where
\(\delta \) is a big constant value. Here, constraints (i) and (ii) ensure positive concordance and non-discordance tests, respectively. For
\(a^*S^c b^*\), either the concordance test or the non-discordance test should be negative, which corresponds to constraints (iii) or (iv)–(vi), respectively. Remark that for
\(M_i=0\), the corresponding constraint (iii) or (iv) is satisfied, which means that either concordance
\(C(a^*,b^*)\) is below cutting level
\(\lambda \) or difference
\(g_i(b^*) - g_i(a^*)\) is exceeding veto threshold
\(v_i\).
Given a pair of alternatives
\((a,b) \in {\mathcal {A}}\), alternative
a possibly outranks alternative
b (written as
\(aS^Pb\)) if and only if
\(\varepsilon ^*> 0\), where (Greco et al.
2011):
$$\begin{aligned} \varepsilon ^*= \hbox {max } \varepsilon \end{aligned}$$
s.t.
$$\begin{aligned} \left. \begin{array}{l} \; C(a,b) = \sum _{i=1}^n \Psi _i(a,b^) \ge \lambda , \\ \; g_i(b) - g_i(a) +\varepsilon \le v_i, \; i=1,\ldots ,n,\\ \; E^{A^R}_{BASE}. \end{array} \right\} E^P(a,b). \end{aligned}$$
This means that if
\(\varepsilon ^*>0\) and constraints
\(E^P(a,b)\) are feasible (for assertion
aSb, concordance test does not fail and no criterion makes veto), then
a outranks
b for at least one compatible instance of the outranking model.
Moreover, alternative
a necessarily outranks alternative
b (written as
\(aS^Nb\)) if and only if
\(\varepsilon ^*\le 0\), where:
$$\begin{aligned} \varepsilon ^*= \hbox {max } \varepsilon \end{aligned}$$
s.t.
$$\begin{aligned} \left. \begin{array}{l} \; C(a,b) = \sum _{i=1}^n \Psi _i(a,b^) + \varepsilon \le \lambda + M_0(a,b), \\ \; g_i(b) - g_i(a) \ge v_i - \delta M_i(a,b), \; i=1,\ldots ,n,\\ \; M_i(a,b) \in \{0, 1\}, \; i=0,\ldots ,n, \; \; \sum _{i=0}^n M_i(a,b) \le n,\\ \; E^{A^R}_{BASE}. \end{array} \right\} E^N(a,b). \end{aligned}$$
This means, in turn, that if
\(\varepsilon ^*\le 0\) or constraints
\(E^N(a,b)\) are infeasible (for assertion
\(aS^cb\) either concordance test fails or one of the criteria makes veto), then
a outranks
b for all compatible instances of the outranking model. In other words,
\(aS^Nb\) is true because
\(aS^cb\) is not true for all compatible instances of the outranking model.
For any pair of alternatives
\((a,b) \in {\mathcal {A}}\):
$$\begin{aligned} aS^Nb \Leftrightarrow \lnot (aS^{cP}b),\; \hbox { as well as, }\; aS^{cP}b \Leftrightarrow \lnot (aS^{N}b), \\ aS^Pb \Leftrightarrow \lnot (aS^{cN}b),\; \hbox { as well as, }\; aS^{cN}b \Leftrightarrow \lnot (aS^{P}b), \end{aligned}$$
thus, there are two sources of information
\((S^N, S^P)\) about four relations
\((S^N, S^{cN}, S^P, S^{cP})\) in set
\({\mathcal {A}}\). They have the following properties:
$$\begin{aligned} aS^Nb\Rightarrow & {} aS^Pb \\ aS^Nb\Rightarrow & {} \lnot (aS^{cN}b) \; \hbox { as well as }\; aS^{cN}b \Rightarrow \lnot (aS^{N}b). \end{aligned}$$
Moreover, for given preference information
\(a^*S b^*\):
$$\begin{aligned} a^*S b^*\Rightarrow a^*S^N b^*\; \hbox { and }\; a^*S b^*\Rightarrow \lnot (b^*S^P a^*). \end{aligned}$$
To work out a recommendation in the choice problem, it is advised to find a kernel of the necessary outranking graph
\(S^N\). In case of the ranking problem, one can exploit the necessary outranking graph including
\(S^N\) and
\(S^{cN}\) relations, using the Net Flow Score procedure. For each alternative
\(a \in {\mathcal {A}}\), the score
\(NFS(a) = strength(a) - weakness(a)\), where
strength(
a) counts all relations (arcs) with other alternatives
z, such that
\(aS^Nz\) or
\(zS^{cN}a\), while
weakness(
a) counts all relations (arcs) with other alternatives
v, such that
\(aS^{cN}v\) or
\(vS^{N}a\). The resulting ranking is a complete preorder determined by
\(NFS(a) \in {\mathcal {A}}\).
2.11 ROR with rule preference model
As mentioned before, a preference model built of “\(if \ldots , then \ldots \)” decision rules induced from examples of decisions has a double advantage in MCDA: it combines the relatively easy elicitation of indirect preference information with the transparency and explainability of decision rules.
The decision examples come either from observations of DM’s past decisions made in the same decision context or from conscious elicitation by the DM on demand of the analyst. In the latter case, they concern some reference alternatives, as in other ROR approaches. In both cases, decision examples may, however, be
inconsistent with the dominance principle commonly accepted for multiple criteria decision problems. Decisions are inconsistent with the dominance principle if:
-
In case of classification: alternative a has been assigned to a worse decision class than alternative b, although a is at least as good as b on all the considered criteria, i.e., a dominates b.
-
In case of choice and ranking: a pair of alternatives (a, b) has been assigned a degree of intensity of preference smaller than pair (c, d), although differences in evaluations between a and b on all the considered criteria are at least as large as the respective differences of evaluations between c and d, i.e., pair (a, b) dominates pair (c, d).
The inconsistency may come from many sources. Examples include:
Handling these inconsistencies is important for preference learning. They cannot be simply considered as noise or error to be eliminated from data, or amalgamated with consistent data by some averaging operators. They should be identified and presented as uncertain patterns.
The concept of
rough set introduced by Pawlak (
1982) appeared to be useful for handling inconsistency in data, however, originally it was limited to inconsistency with respect to the indiscernibility principle. Objects (alternatives) that have the same description are said to be
indiscernible. The indiscernibility relation thus generated in the universe of discourse
U constitutes the mathematical basis of rough set theory. It induces a partition of the universe into blocks of indiscernible objects, called
granules, which can be used to build knowledge about a real or abstract world. Any subset
X of the universe may be expressed in terms of these blocks either precisely (as a union of granules) or approximately. In the latter case, the subset
X may be characterized by two ordinary sets, called
lower approximation and
upper approximation of set
X. A rough set is defined by means of these two approximations, which coincide in the case of an ordinary set. The lower approximation of
X is composed of all the granules included in
X (whose elements, therefore, certainly belong to
X), while the upper approximation of
X consists of all the granules which have a non-empty intersection with
X (whose elements, therefore, may belong to
X). The difference between the upper and lower approximations constitutes the boundary region of the rough set, whose elements cannot be characterized with certainty as belonging or not to
X by using the available information. The information about objects from the boundary region is, therefore, inconsistent or ambiguous. The cardinality of the boundary region states, moreover, the extent to which it is possible to express
X in exact terms, on the basis of the available information. For this reason, this cardinality is used as a measure of the vagueness of information about
X. The lower and upper approximations of a partition of
U into decision classes prepare the ground for inducing, respectively,
certain and
possible knowledge patterns in the form of “
\(if \ldots , then \ldots \)” decision rules.
To handle inconsistencies with respect to the dominance principle, which is typical for preference data, Greco et al. (
2001) generalized the original rough set concept substituting the indiscernibility relation with a dominance relation in the rough approximation of decision classes. The resulting methodology was called
Dominance-based Rough Set Approach (DRSA). An important consequence of DRSA is the possibility of inferring the DM’s preference model in terms of decision rules induced either from lower approximations of decision classes (certain rules), or from upper approximations of decision classes (possible rules), or from the difference between upper and lower approximations (approximate rules supported by inconsistent examples).
In DRSA, the granules used for building approximations of decision classes are dominance cones in the evaluation space. DRSA accepts all scales of criteria because it exploits the order only which is a primitive feature of ordinal, interval, and ratio evaluation scales.
Let us explain DRSA to multiple criteria classification problems first.
The finite universe of discourse
U is composed of classification examples concerning reference alternatives from set
\(A^R \in {\mathcal {A}}\), such that, without loss of generality,
\(g_i:U \rightarrow {\mathbb {R}}\) for each
\(i=1,\ldots , n\), and, for all alternatives
\(a^*,b^*\in U\),
\(g_i(a^*) \ge g_i(b^*)\) means that “
\(a^*\) is at least as good as
\(b^*\) with respect to criterion
\(g_i\)”, which is denoted as
\(a^*\succeq _{i} b^*\). Therefore,
\(\succeq \)\(_{i}\) is a complete preorder, i.e., a strongly complete and transitive binary relation, defined on
U on the basis of evaluations
\(g_i(\cdot )\). Furthermore, there is a decision attribute
d which makes a partition of
U into a finite number of decision classes, called
classification,
Cl=
\({\{}Cl_{1}, \ldots ,Cl_{p}\}\), such that each
\(a^*\in U\) belongs to one and only one class
Cl\(_{t}\),
\(t=1, \ldots , p\). We suppose that the classes are preference ordered, i.e., for all
\(r,s=1, \ldots , p\), such that
\(r>s\), the alternatives from
Cl\(_{r}\) are preferred to the alternatives from
Cl\(_{s}\). More formally, if
\(\succeq \) is a
comprehensive weak preference relation on
U, i.e., if for all
\(a^*,b^*\in U\),
\(a^*{\succeq }b^*\) reads “
\(a^*\) is at least as good as
\(b^*\)”, then we suppose
$$\begin{aligned}{}[a^*{\in }Cl_{r}, \ b^*{\in } Cl_{s}, \ r{>}s] \Rightarrow a^*{\succ }b^*, \end{aligned}$$
where
\(a^*{\succ }b^*\) means
\(a^*{\succeq }b^*\hbox { and}\ {not}\ b^*{\succeq } a^*\). The above assumptions are typical for consideration of a multiple criteria classification problem (also called
ordinal classification with monotonicity constraints).
In DRSA, the explanation of the assignment of alternatives to preference-ordered decision classes is made on the base of their evaluation with respect to a subset of criteria
\(P \subseteq {\mathcal {J}}\). This explanation is called
approximation of decision classes with respect to
P. Of course, most commonly,
\(P={\mathcal {J}}\). In order to take into account the order of decision classes, in DRSA the classes are not considered one by one but, instead, unions of classes are approximated:
upward union from class
\(Cl_t\) to class
\(Cl_p\), denoted by
\(Cl^\ge _t\), and
downward union from class
\(Cl_t\) to class
\(Cl_1\), denoted by
\(Cl^\le _t\), i.e.:
$$\begin{aligned} \mathop {Cl}\nolimits _t^ \ge = \bigcup \limits _{s \ge t} {\mathop {Cl}\nolimits _s }, \quad \mathop {Cl}\nolimits _t^ \le = \bigcup \limits _{s \le t} {\mathop {Cl}\nolimits _s },\quad t=1,\ldots ,p. \end{aligned}$$
The statement
\(a^*\in Cl_t^\ge \) reads “
\(a^*\) belongs to at least class
Cl\(_{t}\)”, while
\(a^*\in Cl_t^ \le \) reads “
\(a^*\) belongs to at most class
Cl\(_{t}\)”. Let us remark that
\(\mathop {Cl}\nolimits _1^ \ge =\mathop {Cl}\nolimits _p^ \le =U\),
\(\mathop {Cl}\nolimits _p^ \ge \)=
Cl\(_{p}\) and
\(\mathop {Cl}\nolimits _1^ \le \)=
Cl\(_{1}\). Furthermore, for
t=2,...,
p, we have:
$$\begin{aligned} \mathop {Cl}\nolimits _{t - 1}^ \le = U \setminus \mathop {Cl}\nolimits _t^ \ge \ \hbox { and } \ \mathop {Cl}\nolimits _t^ \ge =U \setminus \mathop {Cl}\nolimits _{t - 1}^ \le . \end{aligned}$$
The key idea of the rough set approach is an explanation (approximation) of knowledge generated by the decision attribute, using
granules of knowledge generated by condition attributes. In DRSA, where condition attributes are criteria and decision classes are preference ordered, the knowledge to be explained is the assignment of alternatives to
upward and
downward unions of classes, and the granules of knowledge are sets of alternatives contained in
dominance cones defined in the space of evaluation criteria.
Alternative \(a^*\) dominates alternative \(b^*\) with respect to \(P \subseteq {\mathcal {J}}\) (shortly, \(a^*\) P-dominates \(b^*\)), denoted by \(a^*D_{P} b^*\), if for every criterion \( i \in P\), \(g_i(a^*) \ge g_i(b^*)\). The relation of P-dominance is reflexive and transitive, i.e., it is a partial preorder.
Given a set of criteria
\( P \subseteq {\mathcal {J}}\) and
\(a^*\in U\), the granules of knowledge used for approximation in DRSA are:
-
a set of alternatives dominating \(a^*\), called P-dominating set, \(D_P^ + \left( a^*\right) \)={\(b^*\in U\): \(b^*D_{P} a^*\)},
-
a set of alternatives dominated by \(a^*\), called P-dominated set, \(D_P^ - \left( a^*\right) \)={\(b^*\in U\): \(a^*D_{P} b^*\)}.
In terms of multiple criteria evaluations, the above definitions correspond to:
$$\begin{aligned} D_P^ + \left( a^*\right) =\{b^*\in U: g_i(b^*) \ge g_i(a^*), \hbox {for all } i \in P \}, \\ D_P^ - \left( a^*\right) =\{b^*\in U: g_i(b^*) \le g_i(a^*), \hbox {for all } i \in P \}. \end{aligned}$$
Given
\(P \subseteq {\mathcal {J}}\), the inclusion of an alternative
\(a^*\in U\) to the upward union of classes
\(\mathop {Cl}_t^ \ge \),
\(t=2,\ldots ,p\), is
inconsistent with the dominance principle if one of the following conditions holds:
-
\(a^*\) belongs to class \(Cl_{t}\) or better but it is P-dominated by an alternative \(b^*\) belonging to a class worse than \(Cl_{t}\), i.e., \(a^*\in \mathop {Cl}\nolimits _t^ \ge \) but \(\mathop D_P^+ (a^*) \cap Cl_{t - 1}^ \le \ne \emptyset \),
-
\(a^*\) belongs to a worse class than \(Cl_{t}\) but it P-dominates an object \(b^*\) belonging to class \(Cl_{t}\) or better, i.e., \(a^*\notin \mathop {Cl}_t^ \ge \) but \(\mathop D_P^- (a^*) \cap \mathop {Cl}_t^ \ge \ne \emptyset \).
The above considerations bring us closer to the definition of rough approximations of the upward and downward unions of decision classes.
The
P-
lower approximation of
\(\mathop {Cl}_t^ \ge \), denoted by
\({\underline{P}}(\mathop {Cl}_t^ \ge ) \), and the
P-
upper approximation of
\(\mathop {Cl}_t^ \ge \), denoted by
\({\overline{P}}\hbox {(}\mathop {Cl}_t^ \ge \hbox {)}\), are defined as follows (
\(t=2,\ldots ,p\)):
$$\begin{aligned} {\underline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \ge \hbox {)}= & {} {\{}a^*\in U:\mathop D\nolimits _P^ + (a^*) \subseteq \mathop {Cl}\nolimits _t^ \ge {\}}, \\ {\overline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \ge \hbox {)}= & {} {\{}a^*\in U:\mathop D\nolimits _P^ - (a^*) \cap \mathop {Cl}\nolimits _t^ \ge \ne \emptyset {\}}. \end{aligned}$$
The definition of
P-
lower approximation and the
P-
upper approximation of
\(\mathop {Cl}_t^ \le \), is analogous (
\(t=1,\ldots ,p-1\)):
$$\begin{aligned} {\underline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \le \hbox {)}= & {} {\{}a^*\in U:\mathop D\nolimits _P^ - (a^*) \subseteq \mathop {Cl}\nolimits _t^ \le {\}}, \\ {\overline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \le \hbox {)}= & {} {\{}a^*\in U:\mathop D\nolimits _P^ + (a^*) \cap \mathop {Cl}\nolimits _t^ \le \ne \emptyset {\}}. \end{aligned}$$
The
P-
boundaries of
\(\mathop {Cl}_t^ \ge \) and
\(\mathop {Cl}_t^ \le \), denoted by
\(Bn_{P}(\mathop {Cl}_t^ \ge )\) (
\(t=2,\ldots ,p\)) and
\(Bn_{P}(\mathop {Cl}_t^ \le )\) (
\(t=1,\ldots ,p-1\)), respectively, are defined as follows:
\(Bn_{P}(\mathop {Cl}\nolimits _t^ \ge )={\overline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \ge \hbox {)} \setminus {\underline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \ge \hbox {)}\), \(Bn_{P}(\mathop {Cl}\nolimits _t^ \le )={\overline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \le \hbox {)} \setminus {\underline{P}}\hbox {(}\mathop {Cl}\nolimits _t^ \le \hbox {)}\).
For every
\(P \subseteq {\mathcal {J}}\), the alternatives that are consistent in the sense of the dominance principle with all upward and downward unions of classes are
P-correctly classified. For every
\(P \subseteq {\mathcal {J}}\), the
quality of classification Cl by set of criteria
P is defined as the ratio of the number of alternatives
P-consistent with the dominance principle and the number of all alternatives in
U. Since the
P-consistent alternatives are those which do not belong to any
P-boundary
\(Bn_{P}(\mathop {Cl}_t^ \ge )\) or
\(Bn_{P}(\mathop {Cl}_t^ \le )\), and
\(Bn_{P}(\mathop {Cl}_t^ \ge ) = Bn_{P}(\mathop {Cl}_{t-1}^ \le )\) for
\(t = 2, \ldots , p\), the quality of classification
Cl by a set of criteria
\(P \in {\mathcal {J}}\) can be written as
$$\begin{aligned} \gamma _{P}({{{\textbf {Cl}}}}) = \frac{\left| U \setminus \left( \bigcup \limits _{t \in \{2, \ldots ,p\} }Bn_{P}(Cl_{t}^{\ge })\right) \right| }{|U|} =\frac{\left| U \setminus \left( \bigcup \limits _{t \in \{1, \ldots ,p-1\} }Bn_{P}(Cl_{t}^{\le })\right) \right| }{|U|}. \end{aligned}$$
\(\gamma _{P} ( {{{\textbf {Cl}}}} )\) can be seen as a degree of consistency of the classification examples, where
P is the set of criteria and
Cl is the considered classification.
Each minimal (in the sense of inclusion) subset
\(P \subseteq {\mathcal {J}}\), such that
\(\gamma _{P} ({{{\textbf {Cl}}}} ) = \gamma _{{\mathcal {J}}} ({{{\textbf {Cl}}}} )\), is called a
reduct of classification
Cl, and is denoted by
\(RED_{{{\textbf {Cl}}}} \). Let us remark that for a given set of classification examples, one can have more than one reduct. The intersection of all reducts is called the
core and is denoted by
\(CORE_{{{\textbf {Cl}}}} \). Criteria in
\(CORE_{{{\textbf {Cl}}}} \) cannot be removed from consideration without deteriorating the quality of classification
Cl. This means that, in set
\({\mathcal {J}}\), there are three categories of criteria:
-
indispensable criteria included in the core,
-
exchangeable criteria included in some reducts, but not in the core,
-
redundant criteria, neither indispensable nor exchangeable, and thus not included in any reduct.
The dominance-based rough approximations of upward and downward unions of decision classes can serve to induce a model of classification decisions (preference model) in terms of “
\(if \ldots , then \ldots \)” decision rules. Given the preference information in terms of classification examples, it is meaningful to consider the following five types of decision rules intending to classify alternative
\(x \in {\mathcal {A}}\):
1)
certain D\(_{\ge } \)-decision rules, providing lower profiles of alternatives belonging to \({\underline{P}}({\mathop {Cl}_{t}^{ \ge }} )\): if \(g_{i_1}(x)\ge r_{i_1}\) and \( \ldots \) and \(g_{i_h}(x)\ge r_{i_h}\), then \(x \in Cl_{t}^{\ge }, t=2, \ldots , p,\ \) \(\{i_1,...,i_h\} \in P,\) \(r_{i_1}, \ldots , r_{i_h} \in {\mathbb {R}}\);
2)
possible D\(_{\ge } \)-decision rules, providing lower profiles of alternatives belonging to \({\overline{P}} ({\mathop {Cl}_{t}^{ \ge }} )\): if \(g_{i_1}(x)\ge r_{i_1}\) and \( \ldots \) and \(g_{i_h}(x)\ge r_{i_h}\), then x possibly belongs to \(Cl_{t}^{\ge }, t=2, \ldots , p,\,\{i_1,...,i_h\} \in P,\) \(r_{i_1}, \ldots , r_{i_h} \in {\mathbb {R}}\);
3)
certain D\(_{\le } \)-decision rules, providing upper profiles of alternatives belonging to \({\underline{P}} ({\mathop {Cl}_{t}^{ \le }} )\): if \(g_{i_1}(x)\le r_{i_1}\) and \( \ldots \) and \(g_{i_h}(x)\le r_{i_h}\), then \(x \in Cl_{t}^{\le }, t=1, \ldots , p-1,\, \{i_1,...,i_h\} \in P,\) \(r_{i_1}, \ldots , r_{i_h} \in {\mathbb {R}}\);
4)
possible D\(_{\le } \)-decision rules, providing upper profiles of alternatives belonging to \({\overline{P}} ({\mathop {Cl}_{t}^{ \le }} )\): if \(g_{i_1}(x)\le r_{i_1}\) and \( \ldots \) and \(g_{i_h}(x)\le r_{i_h}\), then x possibly belongs to \(Cl_{t}^{\le }, t=1, \ldots , p-1,\, \{i_1,...,i_h\} \in P,\) \(r_{i_1}, \ldots , r_{i_h} \in {\mathbb {R}}\);
5)
approximate D\(_{{\ge } \mathrm{\le }} \)-decision rules, providing simultaneously lower and upper profiles of alternatives belonging to Cl\(_{s} \cup \)Cl\(_{s\mathrm{+} 1} \cup \)...\( \cup \)Cl\(_{t}\), without the possibility of discerning to which class: if \(g_{i_1}(x)\ge r_{i_1}\) and \( \ldots \) and \(g_{i_k}(x)\ge r_{i_k}\) and \(g_{i_{k+1}}(x)\le r_{i_{k+1}}\) and \( \ldots \) and \(g_{i_h}(x)\le r_{i_h}\), then \(x \in Cl_{s} \cup Cl_{s+1} \cup \ldots \cup Cl_{t}, \{i_1, \ldots , i_h\}\subseteq P, \ \) \(s,t \in \{1, \ldots , p\}, \ \) \(s < t, \ \) \(r_{i_1}, \ldots , r_{i_h} \in {\mathbb {R}}\).
The rules of type 1) and 3) represent certain preference patterns extracted from decision examples, while the rules of types 2) and 4) represent possible preference patterns. Rules of type 5) represent doubtful preference patterns. Depending on the classification examples supporting the induced rules, they are characterized by different values of adopted interestingness measures. In (Greco et al.
2016c), some rule interestingness measures with desirable properties are characterized in four perspectives of Bayesian and likelihoodist confirmation. The interestingness measures are used to classify new alternatives by decision rules in situations where these alternatives are matched by (i) no rule, (ii) exactly one rule, (iii) several rules (even contradictory). Such a classification scheme has been proposed by Błaszczyński et al. (
2007).
The above definitions of rough approximations are based on a strict application of the dominance principle. In practice, however, it is beneficial to accept a limited proportion of inconsistent examples in lower approximations, particularly for large data sets. Two parametric versions of DRSA have been proposed, relaxing in different ways the definition of rough approximations: Variable Precision DRSA (Inuiguchi et al.
2009) and Variable Consistency DRSA (Błaszczyński et al.
2009). Statistical interpretation of these two parametric DRSA from the viewpoint of empirical risk minimization typical for machine learning has been given by Kusunoki et al. (
2021). It is also worth mentioning the stochastic DRSA model presented in (Kotłowski and Słowiński
2008; Kotłowski et al.
2008).
Algorithms for induction of decision rules from rough approximations of upward and downward unions of decision classes perform various strategies. The most popular induction strategy, called minimal-cover, offers a minimal set of rules representing the decision examples in the most concise way (i.e., without any redundancy) (Błaszczyński et al.
2012). It is usually performed by greedy heuristics of sequential covering type, giving an approximately minimal-cover set of rules. Other strategies aim at discovering a set of decision rules satisfying some pre-defined user’s requirements, e.g., with respect to minimal support, maximal length, or maximal number of rules. Yet other strategies propose to induce an exhaustive set of rules, i.e., all rules compatible with the provided preference information, which form the most comprehensive knowledge base with respect to the analyzed data set. The recent trend consists of integrating several base classifiers into ensembles or committees of classifiers (Kotłowski and Słowiński
2009). Several methods have been proposed to get diverse base classifiers to be integrated within an ensemble of classifiers. The best known are bagging (Błaszczyński et al.
2010) and boosting (Dembczyński et al.
2010a), which modify the set of objects by sampling or weighting particular examples and use the same learning algorithm to create base classifiers.
ROR has been applied to the rule preference model by Kadziński et al. (
2015). It consists of taking into account all minimal-cover (MC) by sets of decision rules compatible with the provided decision examples when computing recommendations for non-reference alternatives. An MC set of rules is a single base instance of the preference model. In order to generate all MC sets of rules, first an exhaustive set of rules is induced and then, all compatible MC sets of rules are obtained by solving a series of integer LP problems which guarantee that each reference alternative is covered by at least one rule from each MC set. Such an ensemble of MC classifiers is applied to the set of all considered alternatives, producing two types of classification results. Alternative
a is necessarily assigned to class
\(Cl_h\) if and only if
a is assigned to
\(Cl_h\) for all MC sets of rules, and
a is possibly assigned to
\(Cl_h\), if and only if
a is assigned to
\(Cl_h\) for at least one MC set of rules. Since all compatible MC sets of rules are considered, it is also possible to compute class acceptability indices defined as the share of MC sets of rules assigning an alternative to a single class or a set of contiguous classes. The robustness analysis is continued in a twofold way. First, one selects a representative MC set of rules that builds on the class acceptability indices and highlights the most stable part of the robust classification. Then, for each alternative, one can identify rules that are decisive in terms of recommendation in the greatest number of compatible MC sets. These rules directly influence the assignment suggested by different instances, being of particular interest to the DM willing to understand the obtained recommendation.
Let us move to the explanation of
DRSA to multiple criteria choice and ranking problems. While the classification of alternatives is based on their absolute evaluation on the considered criteria, choice and ranking refer to relative evaluation by means of pairwise comparisons of alternatives. Indeed, the position of an alternative in the ranking depends on the quality of other competing alternatives. This required an adaptation of DRSA to pairwise comparisons of alternatives instead of absolute multiple criteria evaluations considered in classification problems. Thus, in the case of choice and ranking problems, the rough approximation concerns a comprehensive preference relation in the set of reference alternatives, and the approximation is done using marginal preference relations on particular criteria for pairs of reference alternatives (Greco et al.
2001).
Technically, the modifications of DRSA necessary to approach the problems of choice and ranking are twofold:
1)
A pairwise comparison table (PCT) is considered instead of the simple classification table: PCT is a classification table whose rows represent pairs of reference alternatives for which multiple criteria evaluations and a comprehensive preference relation are known.
2)
A dominance principle is considered for comparisons of pairs of alternatives instead of simple alternatives: if alternative a is preferred to b at least as strongly as c is preferred to d on all the considered criteria, then a must be comprehensively preferred to b at least as strongly as c is comprehensively preferred to d.
The application of DRSA to the choice or ranking problems proceeds as follows. First, the DM gives examples of pairwise comparisons of reference alternatives from set
\(A^R \in {\mathcal {A}}\). This can be a partial or complete ranking from the best to the worst of these alternatives. From this set of decision examples, a PCT is constructed with an ordered pair of reference alternatives in each row, pairs of evaluations of these alternatives on particular criteria in the columns corresponding to criteria, and comprehensive preference relation in the last column corresponding to the decision. The pairs of evaluations on particular criteria for a pair of reference alternatives
\((a^*, b^*) \in A^R \times A^R\) can be converted to graded differences of evaluations (intensities of preference) in case of criteria with interval or ratio scale. Remark that such a PCT with comprehensive preference relations in the decision is analogous to a classification table where instead of single alternatives there are pairs of alternatives and instead of decision classes there are comprehensive preference relations, e.g., outranking relation
\(a^*S b^*\) and non-outranking relation
\(a^*S^c b^*\) in the simplest case. Thus, all the DRSA methodology developed for the multiple criteria classification can be applied to PCT and yields lower and upper approximations of the comprehensive preference relations, e.g.,
S and
\(S^c\), composed of pairs of reference objects. These approximations prepare the ground for induction of certain, possible, and approximate “
if ...,
then...” decision rules. VC-DRSA methodology applies perfectly to this scheme (Szela̧g et al.
2014a). For example, in the case of comprehensive preference relations
S and
\(S^c\), decision rules induced from VC-DRSA lower approximations have the following form:
-
if \(g_{i_1}(x) - g_{i_1}(y) \ge \delta _{i_1}\) \(and \ldots and\) \(g_{i_h}(x) - g_{i_h}(y) \ge \delta _{i_h}\) and \((g_{i_{h+1}}(x) \ge r_{i_{h+1}}\) and \(g_{i_{h+1}}(y) \le s_{i_{h+1}})\) \(and \ldots and\) \((g_{i_{z}}(x) \ge r_{i_{z}}\) and \(g_{i_{z}}(y) \le s_{i_{z}}),\) \(\ then\) \( \ xSy\), where \(\{i_1, \ldots , i_h\}\subseteq P\) indicate criteria with an interval or ratio scale, \(\{i_{p+1}, \ldots , i_z\}\subseteq {P}\) indicate criteria with an ordinal scale, and \(\ \delta _{i_1}, \ldots , \delta _{i_h} \in {\mathbb {R}}\), as well as \(\ r_{i_{p+1}}, \ldots , r_{i_z}, s_{i_{h+1}}, \ldots , s_{i_z} \in {\mathbb {R}}\);
-
if \(g_{i_1}(x) - g_{i_1}(y) \le \gamma _{i_1}\) \(and \ldots and\) \(g_{i_k}(x) - g_{i_k}(y) \le \gamma _{i_k}\) and \((g_{i_{k+1}}(x) \le r_{i_{k+1}}\) and \(g_{i_{k+1}}(y) \ge s_{i_{k+1}})\) \(and \ldots and\) \((g_{i_{v}}(x) \le r_{i_{v}}\) and \(g_{i_{v}}(y) \ge s_{i_{v}}),\) \(\ then\) \( \ xS^c y\), where \(\{i_1, \ldots , i_k\}\subseteq {P}\) indicate criteria with an interval or ratio scale, \(\{i_{k+1}, \ldots , i_v\}\subseteq {P}\) indicate criteria with an ordinal scale, and \(\ \gamma _{i_1}, \ldots , \gamma _{i_k} \in {\mathbb {R}}\), as well as \(\ r_{i_{k+1}}, \ldots , r_{i_v}, s_{i_{k+1}}, \ldots , s_{i_v} \in {\mathbb {R}}\).
These rules constitute a preference model of the DM. They are applied to the set of alternatives
\({\mathcal {A}}\) with the intention of ranking them. Any pair of alternatives
\((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\) can match these rules in one of four ways:
-
aSb and \(\lnot (aS^c b),\) which is true outranking \(aS^T b\),
-
\(aS^c b\) and \(\lnot (aSb),\) which is false outranking \(aS^F b\),
-
aSb and \(aS^c b,\) which is contradictory outranking \(aS^K b\),
-
\(\lnot (aSb)\) and \(\lnot (aS^c b),\) which is unknown outranking \(aS^U b\).
This 4-valued outranking relation underlines the presence or the absence of positive or negative arguments for outranking. For any
\((a,b) \in {\mathcal {A}} \times {\mathcal {A}}\) it can be faithfully represented by 3-valued fuzzy relation
\(R_{3v}\):
$$\begin{aligned} R_{3v}(a,b)=\frac{[aSb] + (1-[aS^c b])}{2}, \end{aligned}$$
where
\([\ \cdot \ ]\) denotes the indicator function taking value 0 or 1. In order to obtain a final recommendation, relation
\(R_{3v}\) is exploited using a ranking method. In (Szela̧g et al.
2014a), several ranking methods were compared with respect to a set of desirable properties. It appeared that the best ranking method is the Net Flow Rule which builds a ranking of alternatives using a scoring function
NFS(
a) for all
\(a\in {\mathcal {A}}\):
$$\begin{aligned} NFS(a)= & {} \sum _{b\in {\mathcal {A}}\setminus \{a\}}\left( R_{3v}(a,b) - R_{3v}(b,a) \right) \\ {}= & {} \sum _{b\in {\mathcal {A}}\setminus \{a\}}\left( [aSb] - [bSa] - [aS^c b] +[bS^c a] \right) , \end{aligned}$$
where
\([\ \cdot \ ]\) denotes again a (0–1) indicator function. The recommendation for ranking is a complete preorder on set
\({\mathcal {A}}\) determined by
\(NFS( \cdot )\), and the best choice recommendation is alternative
\({\hat{a}} \in {\mathcal {A}}\), such that
\(NFS({\hat{a}})= \max \limits _{a\in {\mathcal {A}}} \{ NFS(a) \}\).
In (Dembczyński et al.
2010b), the DRSA for choice and ranking has been compared with statistical learning of a utility function for single objects, based on the boosting technique. Similar to the Net Flow Rule, the utility function also gives a linear ranking of objects. In conclusion of a computational experiment, the authors state that both analyzed approaches to preference learning share good properties of the decision rule preference model and have good performance in massive-data learning problems.
DRSA has been adapted to a large variety of multi-attribute decision problems (Słowiński et al.
2015). Let’s list a few of them:
-
decision under uncertainty and time preference (Greco et al.
2010b),
-
robustness analysis for decision under uncertainty (Kadziński et al.
2016),
-
classification with missing data (Szela̧g et al.
2017),
-
classification with a hierarchical structure of evaluation criteria (Dembczyński et al.
2002),
-
classification with imprecise evaluations and assignments (Dembczyński et al.
2009),
-
interactive evolutionary multiobjective optimization (Corrente et al.
2024).
The DRSA methodology gained popularity as a useful tool for structuring preference data prior to the induction of the preference model in terms of decision rules. It is based on elementary concepts and mathematical tools (sets and set operations, binary relations), without recourse to any complex algebraic or analytical structures (Greco et al
2010a); the decision rules have very natural interpretation and the key concept of dominance is rational and objective.