NSURLCache & NSCached URLResponse

by Louis Tur

CatThoughts-001: Final Exam

Scenario: You're looking to create an app that queries a web API for data, and that API has a rate limit of 10 requests/10 seconds(req/s), or 500 requests/10 minutes(req/m).

The caveat is that the API doesn't return any indication of your rate limit other than a 429 status (Too many requests) when you've reached your limit. Other than that, you would only receive the usual 200 (Good), 503 (Service Unavailable) or 401 (Not Authenticated). As for the other response headers, there are no Cache-Control, Etag, or Expiration fields.

What do you do? WHAT. DO. YOU. DO! (25pts. no partial credit)


Not so hypothetical

That hypothetical scenario is one that I'm dealing with currently with regards to my latest ZombieQueue series (League of Blargl: Day 1, Day 2). And given that one of the inital steps in the project was to control data refresh by way of the information provided to me via response headers, I've come to a dead stop on this project. But, that also means I have an excuse to delve into NSURLCache which was a half-goal of PodHunt.

In this journey of self(caching)-discovery, I came across a handleful of invaluable resources. I'll list them out here because I'll be referring to them fairly often:


Intro

Why even bother with this as the Apple docs outline that an NSURLCache is automatically created for me as needed? Well, this line gives me chills (and is mentioned by BlackPixel):

If the cached response doesn’t specify that the contents must be revalidated, the URL loading system examines the maximum age or expiration specified in the cached response. If the cached response is recent enough, then the URL loading system returns the cached response. If the response is stale, the URL loading system makes a HEAD request to the originating source to determine whether the resource has changed. If so, the URL loading system fetches the resource from the originating source. Otherwise, it returned the cached response. - Apple Docs (Cache Use Semantics)

Well... then how does the URL loading system know when to actually perform a fetch? Another reference that BlackPixel mentions is RFC2616, Section 13 (well, apple doc does as well) that outlines

If none of Expires, Cache-Control: max-age, or Cache-Control: s- maxage (see section 14.9.3) appears in the response, and the response does not include other restrictions on caching, the cache MAY compute a freshness lifetime using a heuristic.

It seems that, sans the right headers, the HTTP protocol decides how "fresh" a response is by some heuristic.. and from what it appears, that value is calculated for somewhere in the range of 0-24 hours (in BlackPixel's case, it was somewhere between 6-24).

Add some salt to the wound

On top of this ambiguity related to API request limits and data "freshness", a problem that I face with NSURLCache is to take on faith that my requests are actually being cached without my intervention, especially since it’s something that I don’t currently know how to test. And as I mention in a previous post, I don’t like assuming that things are being automatically done for me because I feel that this is an easy source for bugs.

Here’s what I know and understand:

  • Requests are made by a client and are handled by the appropriate servers, which return a response. As part of that response, the server provides HTTP headers which gives the client information regarding how to handle the data it’s about to receive.

  • A select number of those response headers are related to how a client should handle caching a response, specifically the ETag, Cache-Control (HTTP/1.1) and Expiration. Not only those, but there are other tags that some servers will provide to give additional information about requests (for example, request limits and reset times. The Github API is a good example of this).

  • Using the Cache-Control header, an iPhone determines when and if it should refresh the NSURLCache it currently has for a specific response. For example, a Cache-Control header may include a max-age of 3600 indicating that if a request is older than 1 hour, you should discard the cache and pull in a fresh set of data.

  • However, a problem arises when a server doesn’t provide these mentioned headers. And according to black pixel, it seems that an iPhone makes a best guess as to when an NSURLCache should be refreshed (sometime within 24 hours it seems).

So, their solution: Account for the possibility of not having Cache-Control headers in your code by including edge cases in the NSURLSessionDataDelegate method URLSession:dataTask:willCacheResponse:completionHandler: which is called when a response is received after a request.


This is where I come in

In this project, I use AFNetworking’s AFHTTPSessionManager to handle my requests. Though, the class allows me to alter that aforementioned delegate method via its setDataTaskWillCacheResponseBlock: method (which in turn calls upon URLSession:dataTask:willCacheResponse:completionHandler: privately).

At any rate, within the method, I need to check the see if the following conditions are met:

  1. NSURLRequestUseProtocolCachePolicy is the original request’s cache policy
  2. Cache-Control and Expires header fields don’t exist in the response

I use the code snippet from black pixel that they detail in order to determine the above, and as they suggest I attempt to handle the possibility that the above two conditions are met. (Note: I already know it meets these requirements because NSURLRequestUseProtocolCachePolicy is the default option for requests and I’ve already checked the headers for the response from my code and via Postman.)

self.httpSessionManager = [[AFHTTPSessionManager alloc]  
                               initWithSessionConfiguration:[NSURLSessionConfiguration defaultSessionConfiguration]];

    [self.httpSessionManager setDataTaskWillCacheResponseBlock:^NSCachedURLResponse * (NSURLSession *session, NSURLSessionDataTask *dataTask, NSCachedURLResponse *proposedResponse) {
        NSLog(@"Sending back a cached response");
        NSCachedURLResponse * responseCached;
        NSHTTPURLResponse * httpResponse = (NSHTTPURLResponse *)[proposedResponse response];
        if (dataTask.originalRequest.cachePolicy == NSURLRequestUseProtocolCachePolicy) {
            NSDictionary *headers = httpResponse.allHeaderFields;
            NSString * cacheControl = [headers valueForKey:@"Cache-Control"];
            NSString * expires = [headers valueForKey:@"Expires"];
            if (cacheControl == nil && expires == nil) {
                NSLog(@"Server does not provide expiration information and use are using NSURLRequestUseProtocolCachePolicy");
                responseCached = [[NSCachedURLResponse alloc] initWithResponse:dataTask.response
                                                                          data:proposedResponse.data
                                                                      userInfo:@{ @"response" : dataTask.response, @"proposed" : proposedResponse.data }
                                                                 storagePolicy:NSURLCacheStorageAllowed];
            }
        }
        return responseCached;
    }];

So I run this code and..? Well, I know that I have a cache because I can log out some details about it, and I know that the block is being run because I have a breakpoint in the middle of it that’s pausing my code’s execution. But… what’s actually in the cache, if anything? And am I pulling from the cache’s data, or from the API endpoint?

The answer is: I have no idea. But I’m also not about to just go look this up in Google. Peter Steinberger’s blog entry on NSURLCache displays a screenshot of his Finder window with the directory location of a local cache for OSX. And that gave me an idea: if i can find where the data for this project is, I can find it’s contents, and presumably see if it’s creating a cache for these URL requests. And subsequently. I should be able to tell if these requests are being retrieved from the cache or from the web.


The Hunt

First the easy part: I know I can go into my Xcode workspace and find the derived data directory for my project under Window > Organizer. Unfortunately, that doesn’t end up giving me the answers I was looking for as the directory doesn’t contain cache info.
Derived Data

Moving on, the other way that I know to pull data from my running app is via Instruments — and surely it must have a way to capture file I/O. My presumptions in this case work out, and there is a template for such a thing aptly name "File Activity". So with the right instruments in place, I make three simulator runs and inspect each of them. I still don’t really know where to look, but I start with Directory I/O since it seems the most logical and it’s displaying activity on every run at the point at which data theoretically is being retrieved from either the cache or web API. ( 01/14/15 edit: I don't know why I didn't notice this sooner, but I had been wondering why my first run's I/O activity was so much later than runs 2 and 3. And it's kind of obvious now: first run actually retrieved data via the API, whereas runs 2 and 3 pulled from the cache -- making it significantly faster! )

I let out an audible sigh of relief as I looked at the results from the runs.. there were less than a handful of function calls with one very salient and relevant event: __CreateAndReturnPathToCacheFileOnFileSystem with a mkdir function and a full path to the folder. Not only that, but switching to the extended detail view reveals an even more eye-opening stack trace of that event.

From this I can verify that it is indeed creating a singleton instance of an NSURLCache as a result of a NSURLSessionConfiguration. Moreover, checking out the reference documentation for NSURLSessionConfiguration leads me to believe that defaultSessionConfiguration is being used to instantiate the object because (1.) it's the default class method for NSURLSessionConfiguration and it allows for disk caching, whereas ephemeralSessions do not.

Following the file path that’s provided in the extended details, I reach a Cache.db file which I open with a basic database utility (Liya, Mac App Store $Free). In the database inspector that pops up, only five tables are available to me but I immediately zero-in on cfurl_cache_response and run the SQL command: SELECT * FROM cfurl_cache_response.

Such beautiful cached URLs...

The results? A table that details a field named request_key that has both of my request URLs listed!! It also includes some other details, like storage policy (set to 0, which in the NSURLCacheRequestPolicy enum refers to NSURLRequestUseProtocolCachePolicy which is, as a reminder, the default for NSURLConnection), entryID, timestamp and hash_value.


Experimentation
Changing the request

What would happen if I changed the URL request? Theoretically, I would think that this should create another entry in our cache_responses table... and in fact it does! In addition to another entry in the table, the request receives a unique hash value and a new time stamp.

Reverting to the previous request

If my cache really is communicating and using the request_keys as some sort of comparator, then re-running the same URL request should result in no change to the table -- and that's exactly the case: the cache shows the same 3 requests. Though interestingly, the running application itself seems to have no indication from the request headers that a request is new or old.

Also, what would happen to the cached database if I included some userInfo keys as part of the NSCachedURLResponse initialization method initWithResponse:data:userInfo:storagePolicy? Would they now appear in a separate column/table in Liya as well?

The answer to that is (surprisingly) no... Whether or not this userInfo dictionary is used anywhere but by the application at runtime is unclear, but given everything we've looked at I have to deduce that while that userInfo is not saved to the cache, the URL response absolutely has to -- otherwise, how would the app retrieve the response from the cache?

Removing the explicit call to return an NSCachedURLResponse

Fine fine fine.. this experimentation is all well in the spirit of science. But I still need to explore if the delegate method setDataTaskWillCacheResponse: (the one that I am coding additional edge case logic in to) is actually properly handling this caching. As a reminder, it seems that it at the very least it is executing, and something in the app (possibly this delegate) is creating the cache. But right now I want to know if it’s actually me that's invovled in this caching process. So, let’s just always return nil from the setDataTaskWillCacheResponse: and see what happens to the cache. I will try this first with the existing URL requests, and then create a new request to see if another entry is made or not.

//delegate method for NSURLSessionDataTask    
[self.httpSessionManager setDataTaskWillCacheResponseBlock:^NSCachedURLResponse * (NSURLSession *session, NSURLSessionDataTask *dataTask, NSCachedURLResponse *proposedResponse) {
...
        return responseCached; // will always be nil
    }];

Run 1: Return nil from delegate, use old requests = no change to cache

Run 2: Return nil from delegate, create new request = no change to cache

Run 3: Return non-nil NSCachedURLResponse from delegate, create new request = NEW ENTRY TO CACHE

Success!

So at the very least, I’m reasonably assured that I’m responsible for creating and updating the cache. I’m still not entirely certain that data is being pulled from the cache instead of the web, but these results give me the base evidence I need to continue.


Next steps
Network Calls

At this point, it seems to me that the easiest way to figure out if a request is being pulled from the cache or from the URL, is to monitor the network requests at the time of the request. I’m not sure what I’ll be looking for as a comparison, but presumably something like a 304 header or a HEAD request (instead of a GET) would be a good indicator.

You may be thinking: why not just turn the wifi off? And to that I say: shut up. We're not interesting in the results anymore, but rather the explaination to the results. If the call does/n't work with wifi off, I don't know what's actually going on in the background -- I want to be able to point to a stack trace that definitively outlines the road to cached redemption.

Do I actually need that delegate method?

The step after that is to remove my implementation of the AFHTTPSessionManager caching delegate method entirely. In doing so, if the cache gets updated on its own I know that the OS is handling this for me in some way. The only problem I have with this is that it means I still don’t have control over how often something gets refreshed. Meaning, data in my app could be stale because the headers for the response don’t give any indication as to when to make a new request. This isn’t entirely important for mostly static data, but if my goal is to have up-to-the-minute information on player’s matches, then my refresh rate needs to be on a scale of about once every 20 minutes.

So what would that implementation look like? I’m thinking I will just create an NSDate from the headers of the response on the initial call and then have the API check to see if more than 20 minutes have passed since that time. If it has, it will automatically make another asynchronous API call (oh crap, now I have to handle no-data cases too) when the relevant view controllers are presented. This is of course on top of a manual, pull-to-refresh option.

Where does this all lead up to?

Cached data is great, but I’ll need to make a transition from the cache to permanent storage. RestKit seems like a good option considering it was built on AFNetworking and it employs core data. Though, I’m also curious about Parse’s local storage options now as well, so I may look in to that.

Sooooo out of scope...

Seems that what I don’t know, that I don’t know is substantial. I’m going to have to reduce the scope of this ZQ project if my aim is to only spend 7 days on it. I think for now, I will table the persistent storage aspect (given that caching is working) and will just flesh out the rest of the API calls and UI elements.

There’s always the next ZQ

>catthoughts

Louis Tur

"How" has been the single most used word in my literary arsenal for as long as I can remember. I've never really been satisfied knowing that something works, but only by knowing how it works.

Read more from this author