AUTOMATED NEGOTIATION BASED ON ENSEMBLE LEARNING AND OPTIMAL COUNTER PROPOSAL

: Agent-mediated automated negotiation is a key form of interaction in the e-commerce environment. Agents reach an agreement through an iterative process of making offers. However, agents are prone to conceal their private negotiation information, which decreases the efficiency of negotiation. In this paper, an ensemble learning-based negotiation method is proposed. The new method labels the proposals automatically by mining the implicit information in negotiation history data. Then, the labeled proposals become the training samples of the ensemble learning algorithm, which generates the estimation of the opponent’s utility function. At last, based on the utility function of both sides, a win-win negotiation counter-proposal is generated through a particle swarm optimization algorithm. The experimental results indicate the benefits and efficiency of the proposed method .


Introduction
In an e-commerce environment, negotiation is a key form of interaction to reach an agreement [1][2][3][4]. With the rapid development of agent technology, an intelligent agent can decide for themselves what actions they might perform, at what time, and under what conditions. Therefore, the agent can negotiate with each other on behalf of its owner enterprises. In e-commerce, the most common form of negotiation is service-oriented negotiation. Wherein, a service provider and a service consumer have to come to a mutually acceptable agreement over the negotiation issues such as price, quality and service level, etc. In traditional negotiation, agents only consider the benefit of its owner, thus act competitively in a service-oriented negotiation process. Negotiation is both competition and cooperation relationship. For example, suppose the service consumer thinks that money is more important than quality. In the meantime, the service provider thinks that quality is more important than money. The final agreement should have high quality at a high price, which satisfies the utility of both sides to an extreme. Thus, the overall utility is maximized. Is such cases, agents not only concern the welfare for themselves but also for their opponent, which leads to a win-win negotiation. Many research works have been done to provide a win-win negotiation solution.
The difficulties of win-win negotiation solution are information uncertainty and resource limitations. In a competitive business environment, enterprises are prone to conceal their private negotiation information to prevent being malicious used. Therefore, agents only know their negotiation information, which is impossible to make a win-win counter-proposal. There is a lot of literature on promoting negotiation [5][6][7][8][9][10][11]. Zheng [6] propose a tri-training based algorithm to learn the opponent's negotiation preference. Firstly, the process of negotiation was viewed as a proposal's sequence which can be mapped into bidding trajectory feature space to form a sample set. Then, tri-training was imported to increase the number of samples and improve the prediction accuracy of the opponent's negotiation preference learning. Finally, based on the negotiation preference of both sides, an optimization algorithm is conducted to compute a win-win counterproposal. Cheng [7] labels negotiation history data and uses a support vector regression machine to train and estimate the opponent's negotiation utility. Then a genetic algorithm is used to calculate the counteroffer to achieve a win-win negotiation. Hindriks [9] presents a generic framework based on Bayesian learning to learn an opponent model. The opponent model includes issue preferences and issue priorities of the opponent. The proposed algorithm can effectively learn the opponent's preferences from bid exchanges by making some assumptions about the preference structure and rationality of the bidding process. Cheng [10] proposes a support vector machinebased method to learn the opponent's attitudes to solve the problem of bilateral automated negotiation in an agent-mediated application. The procedure of negotiation was transformed into multiple negotiation tracks. Then the opponent's attitude of each issue can be got by learning the negotiation tracks. A negotiation decision-making model was constructed by utilizing the opponent's attitude.
In this paper, an ensemble learning-based negotiation method is proposed. The new method labels the proposals automatically by mining the implicit information in negotiation history data. Then, the labeled proposals become the training samples of the ensemble learning algorithm which generates the estimation of the opponent's utility function. At last, based on the utility function of both sides, a win-win negotiation counter-proposal is generated through a particle swarm optimization algorithm. The experimental results indicate the benefits and efficiency of the proposed method.

Negotiation Model
Negotiation model is defined as a six-tuple: NM=(A,R,S,V,P,U). Wherein, A={a1,a2,…,am} denotes the set of Negotiation participants. In Bilateral negotiation, there are two participants: initiator and opponent. R denotes the negotiation rounds. In negotiation, agents reach an agreement through an iterative process of making offers. S={s1,s2,…,sn} denotes the set of issues under negotiation. For example, in multi-issues negotiation, issues can be price, quality, and service level, etc. For each issue, there are minimum and maximum values, which correspond to the best and worst value the agents can accept. V={v 1 ,v 2 ,…,v n } denotes the set of value range of issues. v i =[min i , max i ] represents value range of issue s i . P={p 1 ,p 2 ,…} denotes the set of proposals the negotiating agent offers. Proposal p={x1,x2,…,xn} is a set of values for all issues. xi∈[mini, maxi] is the value of issue si. U denotes the utility function of negotiating agents. Given a proposal  [3] p, Ui(p) is the utility value of proposal p for agent i. for each agent, there is a utility space which defines the maximum and minimum utility value the agent can accept. The final agreement should lie in the intersection of two agent's utility space.
When the initiator agent establishes and sends an initial proposal for negotiation, the negotiation starts. As can be seen from Fig 1, the initiator agent sends an initial proposal. After that, the opponent agent sends a counter-proposal. In the following, two agents iteratively send counter proposals to agree. At last, one of the agents will accept the last proposal being sent by the other agent or refuse negotiation when receiving an oppressive counter-proposal.   [4] Negotiation history database stores the historical negotiation information of both opponent and self agent. Historical negotiation information mainly contains a proposal list. The specific information includes negotiation round, proposal sender, proposal content, and the receiver's attitude. The data structure is shown in Table 1. In an e-commerce environment, due to fierce competition, agents are prone to conceal their private negotiation information, especially their utility function. Therefore, it is impossible to gain that information directly. However, utility function to some extend implied in the proposals sent in the negotiation process. We can learn utility functions from the negotiation history database through a machine learning method. To improve the accuracy of learning, an ensemble learning algorithm is used to learn and predict the opponent's utility value. The role of training sample generation is to generate a training sample set for the ensemble learning algorithm. The training sample is composed of a proposal i p and the estimation of utility value i y . The output of the ensemble learning algorithm is an estimation of the opponent's utility function UF(). Based on the utility function of both sides, a win-win optimal negotiation counter-proposal is generated through particle a swarm optimization algorithm, this task is taken by a counter-proposal generation module. At last, the negotiation decision module will send counter-proposal to the opponent agent, accept or refuse negotiation. Suppose oppo p is proposal sent by the opponent agent, self p is the optimal counter-proposal just generated.
send a counter-proposal; if time is over, refuse the negotiation.

Training Sample Generation
Negotiation is a process of both competition and cooperation. On the one hand, the negotiation participants hope to maximize their benefit. On the other hand, they also want a quick agreement. Fortunately, negotiation participant usually prefers different negotiation issues, which gives space for the success of negotiation. As is shown in Fig 3,   In an e-commerce environment, the negotiator's utility value isn't public information. However, the utility value information is to some extend implied in proposals sent in the negotiation process. Firstly, any proposal is certainly acceptable by its sender. Further to say, because negotiators will gradually make concessions in the negotiation process, therefore, the utility of the proposal decreases with the increase of negotiation rounds. Secondly, if the receiver refuses a proposal, the current proposal and all the former received proposals are unacceptable to the receiver negotiator. Under these circumstances, the utility of the proposal increases with the increase of negotiation rounds.
Therefore, for the estimation of the utility of the opponent agent, we can divide the proposals into two categories: 1) the acceptable proposal, indicating the proposals sent by the opponent; 2) the unacceptable proposal, including the refused proposal and its former received proposals. According to the analysis above, we estimate the opponent's utility value by the following formula: Through formula (2), the training sample set is labeled, laying a solid foundation for the following ensemble learning algorithm.
To improve the classification effect, the ensemble learning method is imported. In the field of machine learning, ensemble learning algorithm uses multiple base learners to obtain better predictive performance than could be obtained from any of the constituent base learners [12][13][14][15]. Because the estimation of the opponent's utility value is a regression problem. Therefore, the relevance vector machine (RVM) [16][17][18][19][20] is imported as a base learner to solve this regression problem. Here we adopt the most widely used form of ensemble learning algorithm called AdaBoost. At each stage of the AdaBoost algorithm, it trains a new learner using a sample set in which the weighting coefficients are adjusted according to the performance of the previously trained learner so as to give greater weight to the misclassified sample points. Finally, when the desired number of base learners has been trained, they are combined to form a committee using coefficients that give different weights to different base learners [21][22]. Aiming to the characteristics of the problem in this paper, we design an AdaboostUtility algorithm. The precise form of the AdaBoostUtility algorithm is given below:

Optimal Counter Proposal
AdaBoostUtility algorithm gains the estimation of the opponent's utility function UF(). Suppose its own utility function is U(). We can construct a compositive utility function ( ) , wherein  is a weighting factor. The bigger the factor, the more weight is given to self utility. Vice versa, the more weight is given to the opponent's utility. The counter-proposal should maximize CU(), forming the following optimization problem: Wherein,  is value space of proposal p. A particle swarm optimization algorithm (PSO) is used to solve this optimization problem. The new algorithm is called CU_PSO. The precise form of the CU_PSO algorithm is given below: }while maximum iterations or minimum criteria is not attained Wherein, k represents iterations. i p represents a negotiation proposal. i V represents the speed of particle i. pBest represents the local best value. gBest represents the global best value. 1 c and 2 c represent learning factors. Rand() represents the random number between [0,1].

Results and discussions
A series of experimental tests have been undertaken to verify the performance of the ensemble learning-based negotiation algorithm (ELN for short). In the experiment, we take "induction cooker" trading as an example. The negotiation participants include cooker sellers and cooker buyers. Negotiation issues include price, power, quality, and warranty period. We analyze the performance of the negotiation model from two aspects: 1) the number of negotiation rounds. In a negotiation round, negotiators exchange a proposal. The number of negotiation rounds is an important performance indicator. Negotiators want to make agreements in as few negotiation rounds as possible. 2) Total negotiation utility. Total negotiation utility is the sum of the utility of both negotiators. In win-win negotiation, it is better to gain higher total negotiation utility. Two negotiation models are selected for comparison: 1) Coaching based negotiation model [1]; 2) Tritraining based negotiation model [6].  [8] Experiment 1: In this experiment, we compare the average total negotiation utility between ELN, Coaching, and Tri-training based negotiation model. In each negotiation model, negotiation executes 100 times. The average total negotiation utility is shown in Table 2. As shown in Table 2, with the number of training samples increase, the total negotiation utility of all negotiation model increase too. While the number of sample points is small, the Tri-training model achieves the lowest negotiation utility. ELN model achieves the best performance. Its success is based on two points. Firstly, the base classifier is RVM, which have the characteristics of both generative and discriminative algorithm. Secondly, ensemble learning further improves the performance of the RVM algorithm. Therefore, it is obvious the ELN algorithm outperforms the other model.

Experiment 2:
In this experiment, we compare the average negotiation round and success ratio between Random, Tri-training, Coaching, and ELN based negotiation model. The random model uses a random concession strategy. In the Random model, the agent didn't consider the opponent's utility in the process of generating a counter-proposal. Therefore, it will negotiate in the long run, and the success ratio is low. On the contrary, the other three models achieve better negotiation rounds and success ratio. It is because they all consider the opponent's utility. The mindset of cooperation promotes negotiation, achieves better performance. Among them, ELN achieves the best performance. It is because ELN has the most precise estimation of the opponent's utility.

Conclusions
This paper presented a formal model and relative machine learning algorithm for performing tradeoff in automated negotiation. Based on our former experiences in real-world negotiation, the negotiation algorithm had to be designed in a setting in which the negotiating agents have uncertain information about the utility function of their opponent. An ensemble learning-based negotiation method is presented. Firstly, the Training sample set is taken from the negotiation history database.  [9] Secondly, an ensemble learning algorithm is trained to generate the estimation of the opponent's utility function. At last, a win-win negotiation counter-proposal is generated based on the utility function of both sides. The experimental results indicate the benefits and efficiency of the proposed method. In the next study, we will apply the ensemble learning algorithm to the setting of multiparts negotiation.