【原创】r语言twitter 文本挖掘 语义分析分析附代码数据 联系客服

发布时间 : 星期一 文章【原创】r语言twitter 文本挖掘 语义分析分析附代码数据更新完毕开始阅读d4f8253b5bcfa1c7aa00b52acfc789eb172d9eaa

library(dplyr) library(purrr) library(twitteR) library(ggplot2)

Read the Twitter data

load(\19 guoyufei17 smelllikeme@163.com/trump_tweets_df.rda\) Clean up the data library(tidyr)

Find Twitter source is Apple's mobile phone or Android phone samples, clean up other sources of samples tweets <-trump_tweets_df %>%

select(id, statusSource, text, created) %>%

extract(statusSource, \, \for (.*?)<\) %>% filter(source %in%c(\, \))

Visualize the data at different times, corresponding to the Twitter ratio.

And compare the difference between the number of tweets on Android phones and Apple phones

library(lubridate) library(scales)

tweets %>%

count(source, hour =hour(with_tz(created, \))) %>% mutate(percent = n /sum(n)) %>%

ggplot(aes(hour, percent, color = source)) + geom_line() +

scale_y_continuous(labels =percent_format()) + labs(x =\of day (EST)\, y =\of tweets\, color =\)

From the comparison chart we can find, Andrews mobile phone and Apple mobile

phone release Twitter time there is a significant difference, Andrews mobile phone tend to 5:00 to 10 points between the release of Twitter, and Apple phones generally in

10:00 to 20 points Around the release of Twitter. At the same time we can see, Andrews mobile phone release the number of Twitter is higher than the proportion of Apple And then check whether the Twitter contains references, and compare the number of different platforms

library(stringr)

tweets %>% count(source,

quoted =ifelse(str_detect(text, '^\), \, \quoted\)) %>% ggplot(aes(source, n, fill = quoted)) +

geom_bar(stat =\, position =\) + labs(x =\, y =\of tweets\, fill =\) +

ggtitle('Whether tweets start with a quotation mark (\)

From the comparison of the results, Andrews phone, no reference to the ratio was significantly lower than Apple's mobile phone. While the number of Andrews mobile phone applications to be significantly larger than the Apple phone. So you can think that Apple's mobile phone Twitter content is mostly original, and Andrews mobile phone mostly within the application

And then check whether there are links in Twitter or pictures, and compare the situation of different platforms

tweet_picture_counts <-tweets %>% filter(!str_detect(text, '^\)) %>% count(source,

picture =ifelse(str_detect(text, \), \, \picture/link\))

ggplot(tweet_picture_counts, aes(source, n, fill = picture)) + geom_bar(stat =\, position =\) + labs(x =\, y =\of tweets\, fill =\)

From the above comparison chart, we can see the Android phone without pictures or links to the situation with Apple, that is, the use of Apple's mobile phone users in the hair when the general will publish photos or links

At the same time you can see the Andrews platform users to push the general do not use pictures or links, and Apple mobile phone users just the opposite

spr <-tweet_picture_counts %>% spread(source, n) %>%

mutate_each(funs(. /sum(.)), Android, iPhone)

rr <-spr$iPhone[2] /spr$Android[2]

Then we detect the abnormal characters in the Twitter, and delete them Then find the keywords in Twitter, and sort by number

library(tidytext)

reg <-\ tweet_words <-tweets %>%

filter(!str_detect(text, '^\)) %>%