The research plan will be organized in 6 Workpackages. WP0 is dedicated to the Management. WP1-4 are four Workpackages corresponding to the different phases of the PhD’s work that will be carried out under our supervision, with the collaboration of current PhD students and with scientific exchanges with the researchers of Telecom-ParisTech associated with the research topic Social Computing lead by Chloé Clavel. WP1-4 address all together the scientific breakthroughs presented in Section 1.3. We have tried to make each WP able to bring solutions to specific methodological issues. However, the WPs are not entirely independent as the answers brought in one WP should benefit to other WPs. First, we will work on the annotation schema in order to annotate opinions in interactions (WP1). Second, we will work on the modeling of the verbal content of opinion in interactions within the CRF models and word embedding approaches (WP2). Then, we will focus on the integration of acoustic features in the model (WP3). Finally, WP4 is dedicated to the evaluation of the methods on different corpora. There will also be a WP5 dedicated to dissemination. Table 2 and Figure 1 present the effort distribution per WP and participant and the Gantt of the project, respectively.

WP0 – Project Management

WP1 – Annotation of opinions in interactions

Objective: WP1 aims at providing a structured annotation of the opinions contained in the different corpora of interactions

Approach: According to the model of the users’ likes and dislikes defined in [6], we will define an annotation scheme allowing to annotate opinions in interactions using different existing English corpora : 1) the SEMAINE corpus [9] consisting of audio-visual recordings of face- to-face interactions between users and embodied conversational agents 2) the NOXI corpus developed in the context of European project Aria-valuspa6 3) the human-agent negociation corpus developed at the Institute of Creative Technologies consisting of audiovisual recordings of human-human and human-agent interactions performing a negociation task [3]. The scheme will allow us to annotate the opinion structure (source, opinion’s expression and targets) and the interaction context (for example, the topic structure of the conversation or the agent’s communcative goals). The annotation will be carried using crowdsourcing platforms such as Crowdflower https://www.crowdflower.com/ by at least three different labelers. The annotation agreement score will be computed in order to evaluate the annotation consistency and reliability.

Expected results: annotated corpora that will feed the training of the system developed in WP2 and WP3, on the one hand and the reference corpus for the evaluation in WP4, on the other hand.

Success indicators: high agreement scores of the annotations

WP2 – From linguistic rules to feature functions

Objective: WP2 aims at providing a methodology for the integration of complex and inter- actional linguistic features in CRF-based models for the detection of opinions in interactions.

Approach: We will use the linguistic rules that we have implemented using grammars in [6] in order to define feature functions that will be used as an input of CRF models. The first version of the opinion analysis systems will focus on the lexical, syntactic and dialogic-based feature functions. The lexical level models will rely on both sentiment lexicons such as Wordnet Affect and word embeddings features. We will investigate the potential of CRF for a classification using transcripts from oral speech. The discriminative nature of CRF will enable some strong linguistic rules combined with word embedding features to emerge directly from the learning phase. In particular, the feature functions will model : i) the relations between the evaluative expressions, on the one hand, and their target and their source, on the other hand; ii) the conversation context (dialogic level). We will evaluate the differences of behavior of our model running on manual transcripts vs. automatic speech transcripts.

Expected results: First version of the text-based opinion detection system.

Success indicators: F-score for the performance of the system. As it will be the first machine learning system dealing with opinion detection in interactions, it will be difficult to compare ourselves to other system performance. However, the F-score will be compared with a baseline system (for example using LogReg), with the rule-based method evaluated in [6] and with other deep learning approaches such as Long Short Term Memory (LSTM) networks.

Tasks and corresponding deliverables:

WP2 – From linguistic rules to feature functions

Objective: WP3 aims at providing a methodology for the integration of acoustic features in multimodal CRF-based models for the detection of opinions in interactions.

Approach: Acoustic features dedicated to model opinions will be integrated to the model developed in WP2. We will use existing speech feature extraction tools (such as OpenSmile or Covarep) in order to extract prosodic (pitch, intensity and speech rate) and voice quality features, as well as classical audio features (Mel Frequency Cepstral Coefficients) and advanced temporal integration/pooling approaches based on different signal segmentation (for example pause segmentation). We will investigate a latent state model in order to model the opinion of a speaker using a variant of CRF called Hidden Conditional Random Fields (HCRF). We will work on finer grain prosodic patterns relying on a speech to text alignment such as done in [2] to integrate in the CRF and HCRF feature functions. Besides, we will envisage learning audio representations using recurrent neural networks trained on larger audio dataset with emotion labels.

Expected results: First version of the multimodal opinion detection system.

Success indicators: F-score for the performance of the system. As it will be the first system dealing with multimodal opinion detection in interactions, it will be difficult to compare ourselves from other system performance. However, the F-score will be compared with a baseline system (for example using LogReg) and with other deep learning approaches such as Long Short Term Memory (LSTM) networks. We will also evaluate the multimodal contribution by comparing to the F-score obtained using audio only and with the text-based system developed in WP2.

Tasks and corresponding deliverables: T3.1 : Multimodal opinion detection system in interactions (v1 system) from T0+18 to T0+30

WP2 – From linguistic rules to feature functions

Objective: WP4 aims at providing a methodology for the evaluation of opinion detection systems that are grounded in a human-agent interaction.

Approach: Each version of the system will be evaluated on the three different corpora. We will evaluate the performance of the system for the detection of the users’ likes and dislikes and their target. We will also evaluate the contribution of the interaction context modeling in the performance of the system by evaluating two versions of the system : the first one using features relying on the single user’s utterances and the second one using features relying on the dialogue history.

Expected results: Evaluation methodology and in-depth analysis of the detection system results.

Success indicators: Ability of the result analysis to provide research leads for system improvement.

Tasks and corresponding deliverables: