A repository manager, for example, Nexus, allows administrators to logically group repositories together into a single, referred to herein as a “group repository” or a “proxy repository.” This can be done for example to provide a single URL (uniform resource locator) that is stable and to allow administrators to add and remove proxies or other repositories to the group without having to have developers update their settings, etc. This also introduces some level of control over how artifacts are fetched from those repositories.
When a request for an artifact comes in to a conventional group repository, the group repository can work down a list of repositories included in the group repository, in order from top to bottom looking to see if any repository has the requested artifact. The first one that has it, “wins” and it is the artifact from this repository which is served to the client. In a conventional system, the ordering of the list can be important because it allows administrators to provide “overrides” to project object models (poms) and artifacts by placing them in a repository that is higher in the search list.
For metadata requests from a repository manager, typically the repository will hit all repositories in the list for the requested metadata and effectively merge all of the metadata together. This is done because the metadata is typically enumerating all versions of an artifact and the system must look everywhere to present the complete picture back to the client.
Each proxy repository can cache the artifacts it has retrieved from the remote system with the repository, and a repository can include a negative cache (the so-called Not Found Cache “NFC”) that denominates artifacts that are not located in a particular repository. The cache and negative cache can be done to speed up the artifact lookups and ensure that the repository manager does not keep asking the remote repository over and over for an artifact that the repository manager knows is not present on the remote repository.
The ordering of repositories in the list of repositories can be important for performance reasons. Since it is obviously lower cost to hit the locally hosted repository rather than a proxy, one would typically want to search these first. A typical repository manager does not necessarily enforce this to be the case though and can, e.g., allow the administrator the ultimate authority in determining the lookup order.
The repository manager can provide an additional mechanism sometimes referred to as “routing rules” to optimize lookups. These are regular expressions that operate on group repositories and can effectively declare statements such as:
for every request to com/sonatype/* you may only look in “releases” and “snapshots”. In other words, they can never be anywhere else
for every request to com/jboss/ you may NOT look in “releases” or “snapshots”. In other words, jboss artifacts can never be here
Proper configuration of routing rules can have immense performance impact and can reduce the amount of repositories that must be searched before serving back the requested artifact. When the number of repositories in a group gets very large, even the NFC is not very efficient because it holds an item for each artifact-repository combination and thus for a given size of the NFC cache, the system can store fewer artifacts in memory.
In operation, a conventional repository manager can receive a request for a component, for example by file name. The repository manager includes a list of upstream repositories. Sometimes the repository manager receives a request for a component that is not present on one of the repositories. The repository manager's job in that situation is to try and find it in the other repositories.
Consider a situation that a file name/artifact “com/sonatype/projecta/foo1.0.jar” was requested but not found. NEXUS (as an example repository manager) is configured with a list of repositories; the repository manager will go sequentially to each repository in the repository list to look for the requested artifact; if the artifact exists only in the last repository in the list, then the repository manager will go out to all of the other repositories to check for the requested artifact. In this example, “com/sonatype” does not exist. One of the next requests happens to be for “com/sonatype/projecta/foo2.0.jar”. The conventional system looks for this file, using the entire path—even though “com/sonatype” did not exist. The conventional system does no introspection and thus does not determine that part of a path (e.g., “com/sonatype”) is not there and hence none of the files in the path will be found.
As another example, today's conventional systems receive a “get” request for a file (a1.0) from sonatype.com, and the repository manager will then blindly make a “get” request for the file a1.0 to each of the repositories that it manages. The repository manager then will remember that none of the repositories has the file a1.0. A request for a2.0 from sonatype.com will again result in sending a “get” request for a2.0 to each of the repositories. If file is not found, a “fail” will be returned to the “get” request.
The trouble with the routing rules used by conventional systems is that while they are very powerful, they are often underutilized by administrators. This is partly because they are hard to configure properly, and because the most optimal rules are the most strict and end up having to be changed every time a new proxy is added or if a user needs artifacts with a new groupId. In other words, it is hard to define where not to look for everything until the system already knows where it is, and so it is hard to efficiently route requests for completely foreign components that have never been requested before.
Another problem in operation of the conventional system is that the cache might have stored a file, but the fact is important information is usually lost due to being evicted from the cache because there are so many users. There may be requests for 20,000 to 30,000 or more files a day. Caching information for 10,000 files is simply insufficient as a practical matter.