Choosing the parameters of a cache fill strategy that will deliver good performance requires knowledge of cache access patterns.
Long cache fills have the advantage that actual bus bandwidth rises towards the theoretical peak as read size increases. But once the read size exceeds the bus width satisfying the read requires multiple bus cycles and thus may increase cache miss tendency.
If the code is making long sequential sweeps through one or more data structures that are contiguous in memory (e.g., the sort of code that benefits most directly from a "vectorizing" compiler and vector hardware) then typically a long cache fill will be desirable. The extremely high locality of the stream of data references means that there is a commensurately high probability that the additional data read during a long cache fill will actually be used. Finally, because the performance of such "vector" applications is frequently a direct function of memory bandwidth the improved bus utilization translates into increased application speed.
When there is more randomness in the stream of data references a long cache fill may actually degrade performance. There are at least two reasons for this. Because of the lower probability that the additional data will ever be used the larger number of bus cycles necessary to complete a long cache fill may actually lead to an increased average memory load latency. The larger fill size also decreases the number of replaceable cache lines and may therefore hurt performance by increased thrashing in the use of those lines. In other words, it increases the probability that the process of servicing one cache miss will expunge from the cache the contents of some other line that would have taken a hit in the near future. When such behavior becomes especially severe it is termed "thrashing in the cache".
Thus, a conflict exists in providing a system which services the rather predictable needs of well behaved "vector" applications and the chaotic needs of more general computations.