Discovering Weight Knowledge Based on Linguistic Information Aggregation

CHEN Shuguo

, MA Hong^*

AFFILIATIONS

School of Management, Zhejiang University, China

Corresponding author (Address):

MA Hong, School of Management, Zhejiang University, China, Tel: 86-571-88206827, E-mail: hongma@zju.edu.cn

Received Date: December 22, 2021 Accepted Date: January 22, 2022 Published Date: January 25, 2022

doi: 10.17303/jcssd.2022.1.101

Citation: CHEN Shuguo (2022) Discovering Weight Knowledge Based on Linguistic Information Aggregation. J Comput Sci Software Dev 1: 1-8.

ABSTRACT
FULL TEXT
REFERENCES

Weight knowledge is important in many fields in decision science. For example, in an artificial neural network (ANN), knowledge resides in the weights on links between neural neurons, based on which output is obtained from given inputs. Although a meaningful output of an ANN is of great use, implications of individual weights are often difficult to decipher. In the context of qualitative decision making, linguistic variables often express information that is about ordinal ranking rather than an exact quantity, and therefore the challenge lies in how to determine the weights conforming to the reality and making sense of the decision. In this work, we discriminate two kinds of weights, where one kind is related to the importance of attributes, and the other is related to aggregation characteristics. We propose a weight estimation method, in which these two kinds of weights are calculated simultaneously from a set of known cases that provide additional aggregation information. Using a simulated annealing algorithm, weights satisfying certain conditions can be obtained, and results for new cases can be aggregated thereafter. We find that the weights developed in our method are more explanatory than other approaches, such as the artificial neural network.

Keywords: Linguistic Information; Weights Aggregation, Weight Discovering, Simulated Annealing Algorithm

Aggregation of criteria functions to form an overall decision function is important in various fields in decision science. In the simplest cases such as when both outputs of the criterion functions and the overall decision function are numerical values, the knowledge lies in determining the criterion weights, by which one can get the overall grade by calculating a weighted average of all criteria grades. In more complex situations, the output can no longer be represented as a linear function of inputs. For example, when an artificial neural network (ANN) is trained by a set of instances with known results, the knowledge is embodied in weights on links between neural neurons. [14] However, the benefit of an ANN is achieved at a cost, which is the inherent inability to explain, in a comprehensible form, the process through which a decision or output is given. [1] To address this issue, some complementary techniques such as extracting rules from ANN have been studied (see [1, 8] for example).

Uncertainties in inputs and outputs also bring complexity. [21, 22] The Fuzzy Set theory, proposed and developed by Zadeh [24] is often used to deal with the situation where the assessment information is stated in linguistic terms. Membership functions can then be defined to denote the degree of truthfulness of the proposition. Evidence theory, of Shafer [17], is also useful in aggregating information when uncertainty is attributable to ignorance, instead of fuzziness [5]. Because of the difficulties in defining consensus membership function and approximation of the ultimate irregular fuzzy set, a method that directly operates on linguistic variables has been proposed in [9-11]. The method is based on the concepts of ordered weighted averaging (OWA) operators developed by Yager [20] and convex combination of linguistic labels defined by Degani and Bortolan. [3]

Aggregating linguistic information has large potential in practice. [15] For example, when assessing the attributes of cars such as acceleration, braking, handling, ride quality and powertrain, customers often use linguistic terms, or linguistic variables, to describe their opinions. Therefore, how these individual attribute assessments can lead to an overall ranking becomes a complicated problem. Although linguistic ordered weighted averaging (LOWA) operator [9] can provide an aggregation mechanism, it assumes that inputs are equally important. Therefore, it becomes difficult to develop the weights accordingly, which implicitly determine the degree of “anding” and “oring” in the aggregation.

In this work, we develop a method that extends the LOWA operation by removing the “equal importance” assumption in the inputs. The knowledge about the aggregation mechanism and importance of each input variable can be calculated from a set of known cases with additional aggregation information.

This paper is structured as follows. In Section 2, the problem is briefly reviewed. The LOWA operator and the weights in different means are discussed in Section 3. In Section 4, we propose an operator allowing for two kinds of weights, and present how it can be used to aggregate linguistic information based on mining known behavior cases. Numerical examples and related computation are illustrated in Section 5. Finally, the conclusion together with some discussion are given in Section 6.

Let A1 , A2 ,..., An be n criteria in a multi-criteria problem. X is the set of alternatives. For a proposed alternative x € X , A_j(x) indicates the performance of x according to Criterion A_j. The assessment of an alternative against other criterion or itself may be qualitative. Our problem is to develop a proper method to get the overall assessment of x from the set of {A_j (x) | j =1,...,n}.

A number of alternatives with known results are available, which act as a series of cases provided for case-based reasoning. There may also be other form of knowledge on hand such as comparative knowledge of two criteria’s importance, etc.

When the performance of alternatives can be assessed quantitatively, an ANN can be expediently trained to learn the aggregation process. However, people still need the knowledge on weight allocation in the network to help comprehend and make sense out of the decision.

As Yager pointed out [20], when all criteria are equally important, the aggregation structure can be viewed as between two extremes, “anding” and “oring”. At one extreme, the overall result is good only if all the criteria are satisfied. Hence the output is the minimum of all the inputs. At the other extreme, the overall result is good as long as at least one of the criteria is satisfied. The output can then be calculated by maximizing all the inputs.

More generally, if the domain of each A_j is [0,1], and all criteria are equally important, then the aggregation process can be expressed by an OWA operator F with weighting vector v = (v1, v2, ….vn)

where

b_i is the i th largest element in the collection {a₁ ,a₂ ,...,a_n }.

Vector v can determine the “andness” or “orness” of the aggregation. v = (1,0, … ,0) corresponds to the pure “or” operator, while v = (0,0, … ,1)corresponds to the pure“and” operator.

In many real circumstances, it is more desirable that A_j (x) s present a linguistic value. For example, their domain can be a linguistic expression set S = {s_i |i ∈{0,...,T}}, used to provide performance value about alternatives according to various criteria. For instance, S can be the set of

If the term set has the following characteristics:

• The set is ordered: s_i ≥ s_j if i ≥ j ;
• There is the negation operator: Neg (s_i) = s_j such that j = T − i

An aggregation operator, LOWA φ can be defined to compute directly from linguistic labels [10]. For the label set

where, v and B = {b₁,...,b_n } are as before, , h = 2 ,...,n, and C^m is the convex combination of m labels and if m = 2 ,C² is defined as

such that k = min , where b₁ = s_j, b₂ = s_i.

If v_j = 1 and v_i = 0 with i ≠ j,∀_i, then the convex combination is defined as: C_m {v₁, b₁, i = 1,...,m} = b_j

Therefore, it is easy to see that how to calculate the weighting vector of LOWA operator v is a basic question to be resolved.

On the other hand, weights in LOWA operators determine the structure of aggregation, so can be viewed as aggregation weights. If the attributes or criteria are not really of equal importance, another kind of weights should be taken into account. Issues of weighted aggregation operators have been studied in [6, 7, 9-12, 16, 19, 23, 25]. In the following section, how the importance weight vector w of criteria can be integrated into the aggregation process, and how they can be identified from a set of cases are studied.

The weights bound with attributes are more intuitive for assessing alternatives. Most of the information aggregation methodologies in literature presume the weights are assigned in advance, but in reality, it is not easy for the decision- maker to give a certain and consistent weight vector. Some improvement ideas include attribute weights as fuzzy numbers (e.g. [2, 24]) or linguistic variables (e.g. [4, 18]).

In our method these weights are regarded as interior quantitative variables implied in the aggregation results. They are initially unknown, but can be defined and calculated, and used to aggregate information thereafter.

Let the importance weight vector of criteria be w = (w₁, w₂,..., w_n ): .

Considering both the aggregation weights and the attributed weights, for some alternative x : ( A₁ ( x), A₂ ( x),..., A_n ( x)) = (a₁, a₂,..., a_n ), an aggregation operator φ can be defined as:

where , i = 1,...,n , σ is a permutation over {1,2,…,n such that (b₁, b₂,..., b_n ) = (a_σ(1), a_{σ( 2)},..., a_{σ( n )} ).

For the operator φ_{v, w} (we can call it LOW²A), we have simple properties as follows.

b) If w_i = 1 for some i ∈ {1,2,...,n} and v₁ ≠ 0 for all i, then φ_{v, w}(a₁, a ₂,..., a_n ) = a_i Because = 1, and uj = 0 with j ≠ σ-1(i), ∀j, we have φ_{v, w} (a₁, a ₂,..., a_n ) = = a_i.

c) If w_i = 0 for some i ∈ {1,2,...,n}, then φ_{v, w}(a₁, a ₂,..., a_n ) is independent of a_i.

This is obvious from the fact that = 0 and the definition of the convex combination C^m.

The LOW²A operator φ_{v, w} can be illustrated by a simple example.

Assume the aggregation problem involves 4 criteria. v = (0.25,0.25,0.25,0.25), which implies both the “orness” and the “andness” of aggregation are 0.5. An alternative x has criteria assessments (s₀, s₃, s₂, s₄ ).

If four criteria are equally important, then the aggregation result for x by ordinary LOWA operator will be φ = s₃

However, if the criteria weight vector is given, for instance, w = (0.2,0.3,0.4,0.1), then one can calculate the composite weight u = (0.1,0.3,0.4,0.2), and φ = s₂. This comes from the fact that importance of the third largest element has been improved.

Weight discovering model

By means of the LOW²A φ_{v, w} operator defined in 4.1, the aggregation mechanism can be mapped to configuration of the weights v and w. Taking v₁,..., v₂, w₁,..., w_n as variables, according to the cases and information given, a mathematical programming can be built to estimate the weights.

a) Each alternative case (a₁, a ₂,..., a_n ) with known result Sk can be transformed into a constraint equation.

φ_{v, w} (a₁, a ₂,..., a_n ) = S_k

b) The information about the comparison of criteria importance should be expressed by equation or inequality in weight variables.

c) The objective function can be maximizing or minimizing the dispersion of v , or w , or both, where

Model and numerical examples

In this section, we make use of an example to demonstrate how the approach can be applied.

For a collection of alternatives, each can be assessed based on a set of attributes, to get an overall assessment. In group decision making context, information aggregation can refer to producing a comprehensive evaluation from a group of decision makers’ evaluations. In this example, we have four alternatives, and for each one, four linguistic assessments can be obtained from 4 experts according to their own specialties. We are interested in finding aggregation knowledge in terms of aggregation weights and expert weights.

Suppose we have a set of m alternatives, whose assessments results are according to the experts and their actual performance ranks are known. Let the alternative set be {c₁, c₂, c₃, c₄ }. For each alternativeci , we have its attribute linguistic evaluation A(c_i ), and actual total performance (c_i ), as follows.

a) For c₁ :{a₁₁ = s₁, a₂₁ = s₅, a₃₁ = s₄, a₄₁ = s₆}, actual performance is s₅.

b) For c₂ :{a₁₂ = s₁, a₂₂ = s₃, a₃₂ = s₃, a₄₂ = s₁}, actual performance is s₃.

c) For c₃ :{a₁₃ = s₄, a₂₃ = s₅, a₃₃ = s₃, a₄₃ = s₂}, actual performance is s₂.

d) For c₄ :{a₁₄ = s₁, a₂₄ = s₆, a₃₄ = s₁, a₄₄ = s₀}, actual performance is s₁.

Then, a mathematical programming model for minimizing the sum of dispersion of v and of w can be described as:

S. t.

A simulated annealing solution approach

As we can see, the model is formulated as a nonlinear mathematical programming model, which is difficult to solve to obtain exact optimal solutions. Therefore, in this paper, we propose a meta-heuristic approach, simulated annealing [13], to generate feasible solutions and then optimize heuristically.

The simulated annealing algorithm minimizes the non-linear objective function G(v, w) (Eq.(5.1)) from an initial vector based on a series of random moves. A group of parameters determine how a move is generated, and whether it is accepted, and how many times moves occur before temperature gets changed. Among the parameters, p defines the probability used in the generation of a random move. γ determines the initial temperature T₀ as γF(X₀) where X₀ is the initial solution. r refers to the number of random moves before temperature is changed. β specifies the temperature reduction rate, where 0 < β < 1. The simulated annealing algorithm is illustrated as follows.

1) Select an initial solution X₀ . Set the temperature T₀ = γF(X₀).

2) Repeat the following steps r times:

a) Generate a random move. Each element in the vector has probability p unchanged and probability 1-p replaced by a new random value.
b) Calculate the objective value of the new vector X.
c) Update the current configuration to the new vector X if a better objective value is found.
d) Otherwise, let ΔF denote the increase in the value of the objective function. Update the current configuration to the new vector X with probability p₁ = exp(ΔF / T). Keep the current configuration with probability 1 − p₁.

3) Reduce the temperature by multiplying the time reduction parameter β.

4) Terminate when an optimum value is obtained or the objective value cannot get improved after certain number of temperature changes.

In order to find a feasible solution weights, we set the objective function as follows.

In Equation (5.2), u is the utility for evaluation rank, which is set as u(s_i) = i for symmetric and even linguistic term set. For asymmetric terms set, u is set according to [26].

We notice in a standard simulated annealing algorithm, a random move on the weight element w_i may dramatically change the whole weight vector (w₁, w₂,...,w_n), and therefore we apply the following customized procedure to keep some characteristics of the weight vector unchanged when implementing a random move. We select a non-zero w_j , say, the first one greater than 0.2 in our example, and keep each w_i / w_j for i ≠ j unchanged with probability p and change it to max {w_i / w_j + uniform(−1,1),0} with probability 1-p.

Once a (v, w) satisfies F(v, w) = 0, we switch to a procedure for minimizing another objective function G(v, w). The simulated annealing algorithm is the same as the above, except that for each random move, F(v, w) is calculated and feasibility is kept during the whole second phase.

In this experiment, we set p₁ = 0.6, γ₁ = 1, β₁ = 0.98, r₁ = 20, v₀ = (0.25,0.25,0.25,0.25), w₀ = (0.25,0.25,0.25,0.25) in the first phase. After 13 temperature changes, a set of feasible weights is obtained v = (0.278,0.,0.424,0.298), w = (0.,0.25,0.58,0.17)

In the second phase, p₂ = 0.8, γ₂ = 100, β₂ = 0.99, r₂ = 40, we can get an optimized solution, v = (0.05,0.,0.95,0.), w =(0.,0.025,0.244,0.734)

Using these weights, aggregation over a new alternative can be processed conveniently.

It should be noted that a small learning set leads to a large number of feasible weights, and the procedure easily terminates with a solution. While the learning set gets bigger, the searching process becomes harder. But the solution obtained makes more sense.

In this paper, we propose a new method for discovering a mechanism of aggregating linguistic information from a set of cases. Two kinds of weights, namely, aggregation weights and attributed weights are distinguished. They play key roles in aggregation and can be learned from a learning case set composed of alternatives with known aggregation information. Any known alternative provides actual comprehensive evaluations, or only some order relations between them for example, an alternative with attribute performances {a₁₁ = S₁, a₂₁ = S₅, a₃₁ = S₄, a₄₁ = S₆} will be better than the alternative {a₁₁ = S₂, a₂₁ = S₄, a₃₁ = S₅, a₄₁ = S₅} as a whole, can be converted into equality or inequality constraints.

Since the learning set is the foundation of model composition and serves as the base of weights discovering, the alternatives should be chosen according to typicality and rightness. Improper alternatives may result in conflicts among constraints and result in the model yielding no solution. Of course, an amendment of the model can be a goal for programming with only some of the cases satisfying constraints equation. Both some constraints and objectives can be assigned by priority parameters. The constraints with more certainty and accuracy should be with higher priorities and be satisfied first.

Since finding the exact optimal weight solution can be very difficult, we present a procedure beginning with searching feasible solution and then optimizing according to the predefined objective function. The simulated annealing algorithm is employed because it is not dependent on the initial point and shown convergent to the global optimum point. As illustrated in an example, the weights obtained using our method are more explanatory and meaningful when compared with the weights learned from other approaches, such as the traditional ANN.

Andrews R, J Diederich, AB Tickle (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-based Systems, 1995: 373-89.
Cholewa W (1985) Aggregation of fuzzy opinions—an axiomatic approach. Fuzzy Sets and Systems 17: 249-58.
Degani R, G Bortolan (1988) The problem of linguistic approximation in clinical decision making. Int J Approximate Reasoning 2: 143-62.
Delgado M, JL Verdegay MA Vila (1993) On aggregation operations of linguistic labels. Int J of Intelligent Systems 8: 351-70.
Deng M, W Xu, JB Yang (2004) Estimating the attribute weights through evidential reasoning and mathematical programming. International Journal of Information Technology & Decision Making 3: 419-28.
Dubois, D. and H. Prade, Weighted minimum and maximum operations in fuzzy set theory. Information Sciences, 39: 205-10.
Dubois D, H Prade C Testemale (1988) Weighted fuzzy pattern matching. Fuzzy Sets and Systems 28: 313-31.
Elalfi AE, R Haque, ME Elalami (2004) Extracting rules from trained neural network using GA for managing E-business. Applied Soft Computing 4: 65-77.
Herrera F, E Herrera-Viedma, J Verdegay (1996) Direct approach processes in group decision making using linguistic OWA operators. Fuzzy Sets and Systems 79: 175-90.
Herrera F, E Herrera-Viedma (1997) Aggregation operators for linguistic weighted information. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 27: 646-56.
Herrera, F. and E. Herrera-Viedma, Linguistic decision analysis: steps for solving decision problems under linguistic information. Fuzzy Sets and Systems, 2000. 115(1): p. 67-82.
Ju Y (2014) A new method for multiple criteria group decision making with incomplete weight information under linguistic environment.Applied Mathematical Modelling 38: 5256-68.
Kirkpatrick S, CD Gelatt, MP Vecchi (1983) Optimization by simulated annealing. Science 220: 671-80.
Rosenblatt F (1962) Principles of Neurodynamics.
Sanchez E (1989) Importance in knowledge systems. Information Systems14: 455-64.
Schjaer-Jacobsen H (2002) Representation and calculation of economic uncertainties: Intervals, fuzzy numbers, and probabilities. International Journal of Production Economics 78: 91-98.
Shafer G (1976) A mathematical theory of evidence. Vol. 1. 1976: Princeton university press Princeton.
Tong RM, PP Bonissone (1982) A linguistic approach to decisionmaking with fuzzy sets. IEEE Transactions on Systems, Man, and Cybernetics 10: 716-23.
Wang YM, Y Luo, YS Xu (2013) Cross-weight evaluation for pairwise comparison matrices. Group Decision and Negotiation 2013: 1-15.
Yager RR (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics. 18: 183-190.
Yager RR (1994) On weighted median aggregation. International Journal of Uncertainty, Fuzziness and Knowledge- Based Systems 2: 101-13.
Yager RR (1999) Modeling uncertainty using partial information. Information Sciences 121: 271-94.
Yang Gl, JB Yang, DL Xu, M Khoveyni (2016) A three-stage hybrid approach for weight assignment in MADM. Omega.
Zadeh LA (1965) Fuzzy sets. Information and Control 8: 338-53.
Zhang F, J Ignatius, CP Lim, M Goh (2014) A two-stage dynamic group decision making method for processing ordinal information. Knowledge-Based Systems 70: 189-202.
Zhou W, Z Xu (2016) Generalized asymmetric linguistic term set and its application to qualitative decision making involving risk appetites. European J Operational Res 254: 610-21.

Table 1

Table 2

Table 3

Subscales	Mean	SD	Scales’ min-max	Students’min-max	PMS score ≥110* n(%)
Depressive feelings	18.75	7.35	7-35	7-34	283(56.3)
Anxiety	13.78	6.35	7-35	7-35	124(24.7)
Fatigue	17.55	6.19	6-30	6-28	307(61.0)
Irritability	14.20	5.77	5-25	5-25	292(58.1)
Depressive thouhgts	15.53	6.94	7-35	7-32	170(33.8)
Pain	8.19	3.45	3-15	3-14	276(54.9)
Changes in appetite	9.50	3.40	3-15	3-15	353(70.2)
Changes in sleeping habits	7.72	3.29	3-15	3-15	248(49.3)
Bloating	8.88	3.71	3-15	3-15	312(62.0)
Total PMS scale score	113.92	36.15	44-220	44-218	263(52.3)

Characteristic	n	%	Mean±SD	Analyze**
Age
<20 ≥ 21	384 119	76.3 23.7	110.80±35.14 123.98±37.68	t=3.511 p<0.001*
Living area
Rural District Center of province	34 100 369	6.8 19.9 73.4	117.23±47.68 107.10±34.34 115.47±35.30	F=1.37 p=0.010*
Individual living status
With family With friends Alone	206 276 21	41 54.9 4.2	110.41±34.71 115.88±36.20 122.61±46.78	F=1.33 p=0.017*
History of PMS in the first degree relatives
Yes No	233 270	46.3 53.7	123.42±37.18 105.72±33.19	t=5.641** p<0.001*
Regular physical activity
Yes No	135 368	26.8 73.2	110.16±36.85 115.30±35.85	t=1.415 p=0.158**
Daily coffee intake
Yes No	152 351	30.2 69.8	123.51±37.76 109.77±34.68	t=3.970** p<0.001*
Frequent salt intake habit
Yes No	132 371	26.2 73.8	125.71±39.42 109.73±34.00	t=4.441** p<0.001*
Smoking
Yes No	38 465	7.6 92.4	139.07±45.11 111.87±34.58	t=4.546** p<0.001*
Alcohol use
Yes No	41 462	8.2 91.8	129.78±46.26 112.51±34.83	t=2.952** p=0.003*

		Total and Subscale scores of PMS
MAQ subscale		Depressive feeling	Anxiety	Fatigue	Irritability	Depressive thought	Pain	Changes in appetite	Changes in sleeping habits	Bloating	Total
Menstruation as a debilitating event	r	-0,177^a	-0,079	-0,280^a	-0,137^a	-0,122^a	-0,137^a	-0,178^a	-0,082	-0,163^a	-0,204^a
Menstruation as a bothersome event	r	-0,109^b	0,009	-0,081	-0,120^a	-0,059	-0,120^a	-0,099^a	0,008	-0,050	-0,092^b
Menstruation as a natural event	r	0,011	0,031	0,047	0,031	-0,014	0,031	0,042	0,110^b	0,115^a	0,051
Anticipation and prediction of the onset of menstruation	r	-0,427^b	-0,263^a	-0,361^a	-0,511^a	-0,343^a	-0,511^a	-0,342^a	-0,298^a	-0,470^a	-0,479^a
Denial of any effects of menstruation	r	-0,146^a	-0,072	-0,105^b	-0,194^a	-0,090^b	-0,194^a	-0,100^b	-0,040	-0,101^b	-0,139^a
MAQ total	r	-0,338^a	-0,139^a	-0,313^a	-0,350^a	-0,236^a	-0,263^a	-0,150^a	-0,285^a	-0,268^a	-0,335^a

SUPPORT RESOURCES