【原创】r语言twitter 文本挖掘 语义分析分析附代码数据 联系客服

发布时间 : 星期二 文章【原创】r语言twitter 文本挖掘 语义分析分析附代码数据更新完毕开始阅读d4f8253b5bcfa1c7aa00b52acfc789eb172d9eaa

mutate(text =str_replace_all(text, \, \)) %>% unnest_tokens(word, text, token =\, pattern = reg) %>% filter(!word %in%stop_words$word, str_detect(word, \))

tweet_words

## # A tibble: 8,753 × 4

## id source created word

##

## 1 676494179216805888 iPhone 2015-12-14 20:09:15 record ## 2 676494179216805888 iPhone 2015-12-14 20:09:15 health ## 3 676494179216805888 iPhone 2015-12-14 20:09:15 #makeamericagreatagain ## 4 676494179216805888 iPhone 2015-12-14 20:09:15 #trump2016

## 5 676509769562251264 iPhone 2015-12-14 21:11:12 accolade ## 6 676509769562251264 iPhone 2015-12-14 21:11:12 @trumpgolf

## 7 676509769562251264 iPhone 2015-12-14 21:11:12 highly ## 8 676509769562251264 iPhone 2015-12-14 21:11:12 respected ## 9 676509769562251264 iPhone 2015-12-14 21:11:12 golf ## 10 676509769562251264 iPhone 2015-12-14 21:11:12 odyssey ## # ... with 8,743 more rows tweet_words %>%

count(word, sort =TRUE) %>% head(20) %>%

mutate(word =reorder(word, n)) %>% ggplot(aes(word, n)) +

geom_bar(stat =\) + ylab(\) + coord_flip()

From the figure we can see Hillary's keyword ranking is the first, followed by Trump 2016 this keyword. At the same time in the back of the keywords, we also see Trump, and Clinton and so on.

The emotional analysis of the data, and calculate the relative impact of Andrews and Apple mobile phone ratio

The emotional ratio of the different platforms is calculated by the emotional tendencies of the characteristic words, and the visualization is carried out

android_iphone_ratios <-tweet_words %>% count(word, source) %>% filter(sum(n) >=5) %>%

spread(source, n, fill =0) %>% ungroup() %>%

mutate_each(funs((. +1) /sum(. +1)), -word) %>% mutate(logratio =log2(Android /iPhone)) %>% arrange(desc(logratio))

nrc <-sentiments %>%