The present disclosure relates to methods, techniques, and systems for gathering social media content and, in particular, to methods, techniques, and systems for using dynamic search techniques to minimize cost when searching third party content sources
Many successful social media sources are provided free to consumers. Using a generally known revenue model, these same sources attempt to sell information about their users to a variety of companies in order to gain revenue. For example they may sell information to advertisers to enable targeted marketing.
Because of the vast amounts of public data on sources like Facebook and Twitter, the cost to buy this data tends to be exorbitant, which is limiting to a lot of businesses. To circumvent such costs, businesses looking to access this data have used means such as accessing the source itself, using the source's application programming interface or using a web search to access a subset of the data at no charge. In order to prevent such free access, some sources monitor these searching means by limiting the number of times a particular IP address can search the source in a given day, thereby encouraging the purchase of the data directly from the site.
Even when a consumer chooses to purchase the data from the site, this data may be stale and is generally focused on historical data, not on the current real time flow of data. However, the content sources have not seen a need to provide data in real time, as it is far more costly than daily content transfers or using slow crawling search technologies. Content sources that provide data in real time generally only provide the most recent content and limit the amount of historical content available. Also, it is not within the interests of a source provider to offer subsets of the data, because requiring a business to purchase all of the data leads to a better profit model.
Accordingly, in some cases a purchaser must pay for an entire set of data when the purchaser only wants access to a miniscule or smaller portion. In other cases a subset of the data is available for purchase but that subset is only offered once or twice a day, which in some cases is far too slow. Further, these subsets are at times incomplete and consumers cannot necessarily guarantee quality of coverage of particular content unless they purchase the entire set. In yet other cases, when real time data is available, access to historical data is severely limited. Other methods of access include costly pay per search proxy server services.