Web Content is the textual, visual or aural content that is encountered as part of user experience on websites. With the bloom of social media, the web content has become increasingly important for all users, retailers, and investors. However, a huge amount of online contents also pose a challenging problem for deciding which one to exploit. There are billions of different types of contents online, and viewers' attentions are extremely asymmetrically distributed. Only a few contents can attract massive attentions. Thus, attractiveness predictions become more important. A potential attractiveness prediction in advance can facilitate retailers in strategy formatting, assist investors in decision making, and help viewers in content selecting.
The attractiveness predictions have been gradually coming into people's sight. Some scholars begin to explore feature oriented methods for making the attractiveness predictions, and some models (e.g., linear regression model and neural network model) are widely used in this area. However, not all the online contents can be handled by the feature oriented methods. For example, online clips have no significant features, namely no celebrated actors/actresses or no famous scripts. Thus, the feature oriented methods are limited when facing various types of online contents.
Recently, some researchers have already noticed the limitation of the feature oriented methods, and two feature free approaches have been proposed. Because a YouTube video's popularity is mostly determined at the early stage of video age, early view pattern data is widely used as a hint for making future predictions. A number of researchers modeled the accrual of early views and/or votes count to predict the long-term dynamics. But the “Early View Pattern Theory” has been challenged. For example, when a researcher explores another social media platform—Renren.com, a Facebook-like online social media network in China, a noticeable lower correlation between the views in early and later times can be found. Network dynamics methods avoid the “Correlation Assumption” and assume there is an explicit network structure, and by simulating the “word-of-mouth” process within the network, it can foresee the potential attractiveness of the content. But, the “explicit network” assumption has presented several challenges in real applications. First, it turns out to be very difficult, if not possible, to obtain a complete network structure. Second, in many scenarios, the network over which the diffusion takes place is in fact implicit or even unknown.
In most cases, the researcher can only observe who watches a video, but could not know from where/whom he/she hears about it. Although the implicit network based methods have been proposed recently, the fixed parameter assumption or the homogenous node assumption undermine these models' ability and flexibility to model the popularity dynamics.
Thus, according to the present disclosure, there are mainly two limitations of recent studies. First, various types of online contents cannot be handled. The web content includes various kinds of entities, such as text, images, sounds, videos, and animations. Most current researches just target at only one type of these entities, such as a video attractiveness prediction or a microblog attractiveness prediction. However, in practice, one might need a combination of various entities, and these systems often do not provide readily usability for such a combination. Second, a rough potential view amount number is meaningless for further use. Most current works just provide a rough potential future view amount number as the indicator for prediction, but such number is incomparable across different platforms because of different user totality. Thus, a more general and comparable indicator is needed.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.