4.A Sentimental Education: Sentiment Analysis Using SubjectivitySummarization Based on Minimum Cuts

A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

情感教育:基于最小切分的主观概括的情感分析

一、摘要

情感分析寻求识别文本范围内的观点。 一个示例应用程序将电影评论分为“竖起大拇指”或“竖起大拇指”。 为了确定这种情感极性,我们提出了一种新颖的机器学习方法,该方法将文本分类技术应用于文档的主观部分。 可以使用有效的技术来提取这些部分,以找到图形中的最小切口; 这极大地促进了跨句上下文约束的融合

二、Method

  • 2.1 Architecture
    首先使用主观性检测器来确定每个句子是否是主观的。丢弃客观的内容会创建一个摘要,该摘要应更好地将评论的主观内容呈现给默认的极性分类器。

  • 2.2 Context and Subjectivity Detection
    we use an efficient and intuitive graph-based formulation relying on find-ing minimum cuts. 我们使用基于查找最小割线的高效直观的基于图形的公式
    whereas we are concerned with physical proximity between the items to be classified;而我们关注要分类的项目之间的物理距离

  • 2.3 Cut-based classification(基于切割的分类)
    I)non-negative estimates of each xi’s preference for being in Cj based on just the features of xi alone 仅根据xi的特征对每个xi在Cj中的偏好进行非负估计
    II)Association scores assoc(xi, xk): non-negative estimates of how important it is that xi and xk be in the same class. 关联评分assoc(xi,xk):xi和xk在同一类别中的重要性的非负估计。

三、Evaluation Framework

  • polarity dataset
  • Default polarity classifiers
  • Subjectivity dataset
    i)we collected 5000 movie-review snippets from www.rottentomatoes.
    ii)To obtain (mostly) objective data, we took 5000 sen-tences from plot summaries available from the In-ternet Movie Database (www.imdb).
  • Subjectivity detectors
    we can useour default polarity classifiers as “basic” sentence-level subjectivity detectors.我们可以使用默认的极性分类器作为“基本”句子级主观性检测器

四、Experimental Results

As we will see, the use of subjectivity extracts can in the best case provide satisfying improvement in polarity classification.正如我们将看到的,在最佳情况下,使用主观性提取可以在极性分类方面提供令人满意的改进。We therefore conclude that subjectivity ex-traction produces effective summaries of document sentiment.因此,我们得出的结论是,主观性提取产生了有效的文献情感总结。

  • 4.1 Basic subjectivity extraction基本主观性提取
    As noted in Section 3, both Naive Bayes and SVMs can be trained on our subjectivity dataset and then used as a basic subjectivity detector.如第3节所述,朴素贝叶斯和SVM都可以在我们的主观性数据集上进行训练,然后用作基本的主观性检测器。(前者在主观性数据集上具有更好的平均十倍交叉验证性能)

  • 4.2 Incorporating context information
    The previous section demonstrated the value of subjectivity detection. We now examine whether context information, particularly regarding sentence proximity, can further improve subjectivity extraction. As discussed in Section 2.2 and 3, con-textual constraints are easily incorporated via the minimum-cut formalism but are not natural inputs for standard Naive Bayes and SVMs.上一节演示了主观性检测的价值。 现在我们检查上下文信息,尤其是关于句子邻近性的信息是否可以进一步改善主观性的提取。如2.2和3节所述,上下文约束很容易通过最小化形式主义并入,但不是标准朴素贝叶斯和SVM的自然输入

附录

  • 主观性评论数据来源
    www.rottentomatoes.

  • 客观性评论数据来源
    www.imdb