Friday, 14 April 2017

Future of CacheCow and the birth of CacheCore

CacheCow is my most popular OSS project which came to being back in 2012. It started its life as part of WebApiContrib but the need for a full-fledged project supporting different storage options soon led me to create CacheCow - which needed both client and server components.

With the availability of .NET Core which brings its completely new HTTP pipeline, the question has been when or how CacheCow will move to the .NET Core. On the client-side, HttpClient still is the king of making HTTP requests meaning CacheCow.Client will work when the long awaited .NET Standard 2.0 comes along allowing us to reference older .NET libraries. On the server-side, however, it is clear that CacheCow.Server has no place of existence since the pipeline in its entirety has been changed. So what should we do? Create a completely new project for both client-side and server-side or maintain CacheCow.Client (while migrating it to .NET Standard to support the new .NET) and create a new project for the server-side?

I have been thinking hard about this and for the reasons I will explain, I will be creating a completely new project called CacheCore (other contenders were Cache-vNext, CacheDnx and also recently CacheStandard) which will contain both client and server elements.

If you would like to know the details (REST discussions, lessons learned including some gory confessions below) read the rest, otherwise feel free to watch the space as things will start to happen.

I am under no illusion that this will require quite some effort. Apart from the learning curve and niggles with tooling, I find the most frustrating aspect to be trying to google anything Core related: internet is now full of discarded evolutionary artifacts in the form of blogs, stackoverflow questions, tutorials and even MSDN documentation - each documenting the journey not the current state. If you think that is a small issue, ask anyone picking .NET Core for the first time. I wish we had a big giant flush and could have flushed all that to /dev/null wiping the history clean - never mind all those many many hours lost. OK, rant over - promise.


As I said, I have confessions to make and one is just coming. When I designed server components of CacheCow as an API middleware, the idea was that they would be used for services that are purely RESTful in the sense that all changes to the state of the resource would be going through the API to carry out the state change. Initially there does not seem anything extra-ordinary about this but I gradually learnt that cache coherency is a very big responsibility for a mere middleware to take on.

First of all, there are many services out there that the underlying data could change without a request passing through the API. What is worse is that even if all state change is via API calls, the change to a resource could invalidate other resources. For example, a PUT request to /cars/123 will invalidate the /cars/123 which is fine, but what about ‘/cars’? So I started thinking about resources in terms of collection and instance and CacheCow.Server started to infer collection and instance resources based on a convention - hence I used Route Pattern concept so the application could configure the cache invalidation, so here route pattern would be /cars/*.

But the problem did not stop there. A change to /cars/123/contracts/456 could invalidate all these URLS: /cars/123/contracts, /cars/123 and possibly /cars - hence CacheCow now needs to walk up the tree and invalidate all those resources. And now to the next level of headaches: a POST /orders/1234 could invalidate customer/987 as there is no apparent connection unless the application tells us - which made me introduce the concept of Linked Route Patterns so the application could configure these relationships. And configuring was of course a pain, and frankly I think except me and a handful other people really did not quite get what I was on about.

Now, I believe it is too much of a responsibility for an HTTP middleware to do cache coherency. As such CacheCore.Server will be a lot simpler: no more entity tag storage, application will decide to use ETag or LastModifieDate for cache coherency and will be responsible for providing these values - although I will provide some helpers. One key difference in this implementation would be a set of tools fitting different scenarios rather than a single HTTP Caching god-class.

To explain this aspect further, HTTP caching is a spectrum of primitives that help you build more scalable (caching) and consistent (concurrency) systems - some of which are basic and used by many, while others have remained obscure and seldom used. Caching and expiry on resources are better known while from my experience, conditional PUT to achieve optimistic concurrency is rarely used - even conditional GET is rarely used by HTTP clients other than browsers. As such, CacheCore will come with three filters starting from the most basic to the most advanced:
  • BasicCacheFilter: This is the simplest filter which covers returning Cache-Control headers according to expiry configuration, reading the ETag or LastModified from the returned model (or inferring them by using reflection) and handling conditional GET for you. As long as you have a property called ETag or LastModified (or LastModifiedDate, etc) on the model you return from your API, this will work. For conditional GETs to this filter you would not save on any pressure your “database”: API calls will result on retrieval of data to the API so the filter can find the ETag or LastModified and accordingly respond to conditional GET requests.
  • LookupCacheFilter: This filter improves on the BasicCacheFilter by allowing the application to provide a callback mechanism for the application to look up ETag or LastModified without having to load the full model. Caching almost always gets used on resources where the operation is expensive either in IO or computation costs and this approach helps you to replace loading the full model with a light-weight lookup call. For example, let’s say the resource is /cars/123 and you keep a LastModifiedDate on your cars database and use hash of the LastModifiedDate as the ETag (you could use LastModifiedDate to do cache validation on the date but HTTP date’s precision is sadly up to a second which might not be enough for you). In this case, the filter will enquire the application for ETag or LastModified of the resource and you can call your database and read that value for car:id=123 without loading the whole car - which is going to be a lighter database call. So this filter will do all BasicCacheFilter (and in more efficient way) and will even do conditional PUT for you. What is the problem with this one? Consistency: in terms of conditional PUT, validation is not atomic, e.g. you look up the ETag and you find the condition is met and proceed to update meanwhile data could have changed between the lookup and update (same could also apply to conditional GET but has less serious impact). This is not a problem for everyone hence I think this filter hits the sweet spot for simplicity and effectiveness.
  • StrongConsistencyCacheFilter: This is basically the same as above but maintains airtight consistency by allowing the application to implement atomic conditional GET and PUT - which means application has to do more.
I have plans for these to be GET or PUT specific since actions are usually designed as such.
Now you might ask, why CacheCore is a filter and not a middleware? If you remember, CacheCow.Server was a DelegatingHandler (akin to an ASP.NET Core middleware). Well, here is another lesson learnt: caching is a highly localised concern, it is a mistake to implement it as a global HTTP intermediary.


Considering the client story in .NET Core for HTTP has not been drastically changed, it is fair to assume CacheCow.Client can still be used.

That is true, however, there are a few reasons I would like to start afresh. First of all, CacheCow’s inception and the main of the codebase was designed when .NET yet did not have an await keyword. This resulted in a .ContinueWith() soup which was hard to read and difficult to maintain. On the other hand, some interfaces supported async while others did not, resulting in breaking the async all the way rule. Also I had in mind for the storage to be clever about how much storage it uses per site and implement LRU while many underlying storages did not provide the primitive to do so - and frankly in this 5 years I have never needed it.

I think it is time to get rid of these shortcomings hence there will be a new client project too.

Future of CacheCow.Server and CacheCow.Client

It would be naive to think everyone will move to .NET Core straightaway. In fact, with .NET Standard 2.0, Microsoft has shown to have realised there needs to be a better interoperability between the classic .NET and the .NET Core. Apart from interoperability, I think people will carry on using and building .NET APIs for another few years.

Fore these reasons, I will carry on supporting CacheCow and releasing bug fixes, etc. Thanks for helping it improve by using it, reporting issues and sending bug fixes.

Tuesday, 31 January 2017

Announcing Zipkin Collector for Azure EventHub

If you are reading this, you have probably heard of Zipkin. If not, please take my word to leave this post to spend 10 minutes reading up on it - a very worthwhile 10 minutes which will introduce to you one of the best, yet simplest distributed tracing systems. It one word, it tells you where the time to serve requests been most spent helping you to optimise your Microservice architecture.

Zipkin, used by the likes of Twitter and Netflix, has already created a complete storm in the Java/JVM ecosystem, but many of us in the .NET community have not heard of it - and that is frankly a real pity. And if you have heard it and want to use it, yes of course we can try to port the whole system over to .NET but that would be a huge amount of work and frankly a waste since Zipkin is designed to work across different stacks as long as you can somehow get your data over to it. The data is normally pushed to Kafka, and Zipkin consume messages from Kafka by a component called Collector. Data then gets stored in a storage (currently available for MySQL, Cassandra or Elasticsearch) and then served by the UI.

Of course nothing stops you to run Kafka in your cloud or on-premise environment, but if you have never done it, to say the least, ZooKeeper (a consensus required for running Kafka) is not the easiest service to operate. And frankly if you are on Azure it makes a lot of sense to use EventHub, an Azure PaaS service with functionality very similar to Kafka. Sadly there were no collector for it.

I have been very keen to bring Zipkin to ASOS, but could not quite justify running ZK and Kafka, even for myself. Hence I felt something has to be done about it. The only problem: had never done a Java/Maven project before.

*     *     *

I have been doing what I have been doing - being a professional developer - for some time now. And I have had my ups and downs, both moments that I am proud of and moments of embarrassment because I have messed up. But never, have I just picked up a complete different stack, and built something like what I am going to share, within a couple of weeks. [Yeah I am talking about Zipkin Collector for Azure EventHub]

This really has been a testament to how pluggable and nicely designed-Zipkin is, and above all it has a truly amazing community - championed by Adrian Cole. Help was always around the corner, be it on hardcore stuff such as how to modularise collector or my noob problems with Maven.

Not to forget too, that Azure EventHub SDK basically made it completely trivial to implement a working solution. All the heavy lifting has been done by the EventProcessorHost so all is left is a bit of plumbing to get the configuration over to these components.

*     *     *

How to use EventHub Collector

So the idea is that you would run zipkin-server (which hosts the Zipkin UI) and in the same process you run your collector. Zipkin uses Spring Boot's auto configuration mechanism to load the right collector based on the configurations provided. The project is host on github. [UPDATE: Project has moved to OpenZipkin organisation here]

EventHub Collector gets triggered by the existence of "zipkin.collector.eventhub.eventHubConnectionString" configuration via command line. Rest of the configurations necessary can be passed by an or application.yaml file.

So to run the EventHub collector you need:

1- zipkin.jar (zipkin-server)
2- file
3- zipkin-collector-eventhub-autoconfig module jar (which contains transitive dependencies too). This jar is not on maven yet

So in order to run:

1- Clone the source and build

mkdir zipkin-collector-eventhub
cd zipkin-collector-eventhub
git clone
mvn package

If you do not have maven, get maven here.

2- Unpackage MODULE jar into an empty folder

copy zipkin-collector-eventhub-autoconfig-x.x.x-SNAPSHOT-module.jar (that has been package in the target folder) into an empty folder and unpackage

jar xf zipkin-collector-eventhub-autoconfig-0.1.0-SNAPSHOT-module.jar

You may then delete the jar itself.

3- Download zipkin-server jar

Download the latest zipkin-server jar (which is named zipkin.jar) from here. For more information visit zipkin-server homepage.

4- create an file for configuration next to the zipkin.jar file

Populate the configuration - make sure the resources (Azure Storage, EventHub, etc) exist. Only storageConnectionString is mandatory the rest are optional and must be used only to override the defaults:

zipkin.collector.eventhub.storageConnectionString=<azure storage connection string>
zipkin.collector.eventhub.eventHubName=<name of the eventhub, default is zipkin>
zipkin.collector.eventhub.consumerGroupName=<name of the consumer group, default is $Default>
zipkin.collector.eventhub.storageContainerName=<name of the storage container, default is zipkin>
zipkin.collector.eventhub.processorHostName=<name of the processor host, default is a randomly generated GUID>
zipkin.collector.eventhub.storageBlobPrefix=<the path within container where blobs are created for partition lease, processorHostName>

5- Run the server along with the collector

Assuming zipkin.jar and are in the current working directory, run this from the command line (note that the connection string to the eventhub itself is passed in the command line):

java -Dloader.path=/where/jar/was/unpackaged -cp zipkin.jar org.springframework.boot.loader.PropertiesLauncher --zipkin.collector.eventhub.eventHubConnectionString="<eventhub connection string, make sure quoted otherwise won't work>"

After running, spring boot and the rest of the stack gets loaded and then you should be able to see some INFO output from the collector outputting the configuration you have passed.

You should be up and running and can start pushing spans to your EventHub.

Span serialisation guideline

EventHub Collector expects spans serialised as JSON array of spans. The payload gets read as a UTF-8 string and gets deserialised by the zipkin-server components.


Next step is to get the jar on to maven central. Also I will start working on a .NET library to make building spans easier. 

Wednesday, 20 July 2016

Singleton HttpClient? Beware of this serious behaviour and how to fix it

If you are consuming a Web API in your server-side code (or .NET client-side app), you are very likely to be using an HttpClient.

HttpClient is a very nice and clean implementation that came as part of Web API and replaced its clunky predecessor WebClient (although only in its HTTP functionality, WebClient can do more than just HTTP).

HttpClient is usually meant to be used with more than just a single request. It conveniently allows for default headers to be set and applied to all requests. Also you can plug in a CookieContainer to allow for all sessions.

Now, ironically it also implements IDisposable suggesting a short-lived lifetime and disposing it as soon as you are done with. This lead to several discussions in the community (here from Microsoft Patterns and Practices, Darrel Miller in here and a few references in StackOverflow here) to discuss whether it can be used with longer lifetime and more importantly whether it needs disposal.

Singleton HttpClient matters, especially when it comes to the performance [Dragan Brankovich - Flickr]

HttpClient implements IDisposable only indirectly through HttpMessageHandler and only as a result of in-case not an immediate need - I am not aware of an implementation of HttpMessageHandler that holds unmanaged resources (the mere reason for implementing IDisposable).

In short, the community agreed that it was 100% safe, not only not disposing the HttpClient, but also to use it as Singleton. The main concern was thread safety when making concurrent HTTP calls - and even official documentations said there is no risk doing that.

But it turns out there is a serious issue: DNS changes are NOT honoured and HttpClient (through HttpClientHandler) hogs the connections until socket is closed. Indefinitely. So when does DNS change occur? Everytime you do blue-green deployment (in Azure cloud services when you deploy to staging slot and then swap production/staging slots). Everytime you change settings in your Azure Traffic Manager. Failover scenarios. Internally in a myriad of PaaS offerings.

And this has been going on for more than 2 years without being reported... makes me wonder what kind of applications we build with .NET?

Now if the reason for DNS change is failover, your connection would have been faulted anyway so this time connection would open against the new server. But if this were the blue-black deployment, you swap the staging and production and your calls would still go to the staging environment - a behaviour we had seen but had fixed it by bouncing the dependent servers thinking possibly this was an Azure oddity. What a fool was I - it was there in the code! Whose code? Well debateable...


All of this goes back to the implementation in HttpClientHandler that uses HttpWebRequest to make connections none of which code is open sourced. But obviously using Jetbrain’s dotPeek we can look into the decompiled code and see that HttpClientHandler creates a connection group (named with its hashcode) and does not close the connections in the group until getting disposed. This basically means the DNS check never happens as long as a connection is open. This is really terrifying...
protected override void Dispose(bool disposing)
    if (disposing && !this.disposed)
        this.disposed = true;
As you can see, ServicePoint class plays an important role here: controlling number of concurrent connects to a ‘service point/endpoint’ as well as keep-alive behaviours.


A naive solution would be to dispose the HttpClient (hence the HttpClientHandler) every time you use it. As explained this is not how HttpClient is intended to be used.

Another solution is to set ConnectionClose property of DefaultRequestHeaders on your HttpClient:
var client = new HttpClient();
client.DefaultRequestHeaders.ConnectionClose = true;
This will set the HTTP’s keep-alive header to false so the socket will be closed after a single request. It turns out this can add roughly extra 35ms (with long tails, i.e amplifying outliers) to each of your HTTP calls preventing you to take advantage of benefits of re-using a socket. So what is the solution then?

Well, courtesy of my good friend Andy Jutton of Amido, the solution lies in an obscure feature of the ServicePoint class. Basically, as we said, ServicePoint controls many aspects of TCP connections and one of the properties is ConnectionLeaseTimeout which controls how many milliseconds a TCP socket should be kept open. Its default value is -1 which means connections will be stay open indefinitely… well in real terms, until the server closes the connection or there is a network disruption - or the HttpClientHandler gets disposed as discussed.

So the root cause is basically that with the default value of -1, which is IMHO, wrong and potentially dangerous setting.

Now to fix it, all we need to do is to get hold of the ServicePoint object for the endpoint by passing the URL to it and set the ConnectionLeaseTimeout:
var sp = ServicePointManager.FindServicePoint(new Uri(""));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute
So this is something that you would want to do only at the startup of your application, once and for all endpoints your application is going to hit (if endpoints decided at runtime, you would be setting that at the time of discovery). Bear in mind, path and query strings are ignored and only the host, port and schema are important. Depending on your scenario, values of 1-5 minutes probably make sense.


Using Singleton HttpClient results in your instance not to honour DNS changes which can have serious implications. The solution is to set the ConnectionLeaseTimeout of the ServicePoint object for the endpoint.

Tuesday, 14 June 2016

After all, it might not matter - A commentary on the status of .NET

Did you know what was the most menacing nightmare for a peasant soldier in Medieval wars? Approaching of a knight.

Approaching of a knight - a peasant soldier's nightmare [image source]

Famous for gallantry and bravery, armed to the teeth and having many years of training and battle experience, knights were the ultimate war machine for the better part of Medieval times. The likelihood of survival for a peasant soldier in an encounter with a knight was very small. They should somehow deflect or evade the attack of the knight’s sword or lance meanwhile wielding a heavy sword bring about the injury exactly at the right time when the knight passes. Not many peasant had the right training or prowess to do so.

Appearing around 1000 AD, the dominance of knights started following the conquest of William of Normandy in 11th century and reached it heights in 14th century:
“When the 14th century began, knights were as convinced as they had always been that they were the topmost warriors in the world, that they were invincible against other soldiers and were destined to remain so forever… To battle and win renown against other knights was regarded as the supreme knightly occupation” [Knights and the Age of Chivalry,1974]
And then something happened. Something that changed the military combat for the centuries to come: the projectile weapons.
“During the fifteenth century the knight was more and more often confronted by disciplined and better equipped professional soldiers who were armed with a variety of weapons capable of piercing and crushing the best products of the armourer’s workshop: the Swiss with their halberds, the English with their bills and long-bows, the French with their glaives and the Flemings with their hand guns” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
The development of longsword had provided more effectiveness for the knight attack but there was no degree of training or improved plate armour could stop the rise of the projectile weapons:
“Armorers could certainly have made the breastplates thick enough to withstand arrows and bolts from longbows and crossbows, but the knights could not have carried such a weight around all day in the summer time without dying of heat stroke.”
And the final blow was the handguns:
“The use of hand guns provided the final factor in the inevitable process which would render armor obsolete” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, 1988]
And with the advent of arbalests, importance of lifelong training disappeared since “an inexperienced arbalestier could use one to kill a knight who had a lifetime of training”

Projectile weapons [image source]

Over the course of the century, knighthood gradually disappeared from the face of the earth.

A paradigm shift. A disruption.

*       *       *

After the big promise of web 1.0 was not delivered resulting in the .com crash of 2000-2001, development of robust RPC technologies combined with better languages and tooling gradually rose to fulfill the same promise in web 2.0. On the enterprise front, the need for reducing cost by automating business process lead to the growth of IT departments in virtually any company that could have a chance to survive in the 2000s decade.

In the small-to-medium enterprises, the solutions almost invariably involved some form of a database in the backend, storing CRUD operations performed on data entry forms. The need for reporting on those databases resulted in creating business Intelligence functions employing more and more SQL experts.

With the rise of e-Commerce, there was a need for most companies to have online presence and and ability to offer some form of shopping experience online. On the other hand, to reduce cost of postage and paper, companies started having account management online.

Whether SOA or not, these systems functioned pretty well for the limited functionality they were offering. The important skills the developers of these systems needed to have was good command of the language used, object-oriented coding design principles (e.g. SOLID, etc), TDD and also knowledge of the agile principles and process. In terms of scalability and performance, these systems were rarely, if ever, pressed hard enough to break - even with sticky sessions could work as long as you had enough number of servers (it was often said “we are not Google or Facebook”). Obviously availability suffered but downtime was something businesses had used to and it was accepted as the general failure of IT.

True, some of these systems were actually “lifted and shifted” to the cloud, but in reality not much had changed from the naive solutions of the early 2000s. And I call these systems The Simpleton Swamps.

Did you see what was lacking in all of above? Distributed Computing.

*       *       *

It is a fair question that we need to ask ourselves: what was it that we, as the .NET community, were doing during the last 10 years of innovations? The first wave of innovations was the introduction of revolutionary papers of on BigTable and Dynamo Which later resulted in the emergence of NoSQL movement with Apache Cassandra, Riak and Redis (and later Elasticsearch). [During this time I guess we were busy with WPF and Silverlight. Where are they now?]

The second wave was the Big Data revolution with Apache Hadoop ecosystem (HDFS, Pig, Hive, Mahout, Flume, HBase). [I guess we were doing Windows Phone development building Metro UI back then. Where are they now?]

The third wave started with Kafka (and streaming solutions that followed), Grid Computing platforms with YARN and Mesos and also the extended Big Data family such as Spark, Storm, Impala, Drill, too many to name. In the meantime, Machine Learning became mainstream and the success of Deep Learning brought yet another dimension to the industry. [I guess we were rebuilding our web stack with Katana project. Where is it now?]

And finally we have the Docker family and extended Grid Computing (registry, discovery and orchestration) software such as DCOS, Kubernetes, Marathon, Consul, etcd… Also the logging/monitoring stacks such as Kibana, Grafana, InfluxDB, etc which had started along the way as an essential ingredient of any such serious venture. The point is neither the creators nor the consumers of these frameworks could do any of this without in-depth knowledge of Distributed Computing. These platforms are not built to shield you from it, but to merely empower you to make the right decisions without having to implement a consensus algorithm from scratch or dealing with the subtleties of building a gossip protocol.

And what was it that we have been doing recently? Well I guess we were rebuilding our stacks again with the #vNext aka #DNX aka #aspnetcore. Where are they now? Well actually a release is coming soon: 27th of June to be exact. But anyone who has been following the events closely knows that due to recent changes in direction, we are still - give or take - 9 to18 months far from a stable platform that can be built upon.

So a big storm of paradigm shifts swept the whole industry and we have been still tinkering with our simpleton swamps. Please just have a look at this big list, only a single one of them is C#: Greg Young’s EventStore. And by looking at the list you see the same pattern, same shifts in focus.

.NET ecosystem is dangerously oblivious to distributed computing. True we have recent exceptions such as (a JVM port) or Orleans but it has not really penetrated and infused the ecosystem. If all we want to do is to simply build the front-end APIs (akin to nodejs) or cross-platform native apps (using Xamarin studio) is not a problem. But if we are not supposed to build the sizeable chunk of backend services, let’s make it clear here.

*       *       *

Actually there is fair amount of distributed computing happening in .NET. Over the last 7 years Microsoft has built significant numbers of services that are out to compete with the big list mentioned above: Azure Table Storage (arguably a BigTable implementation), Azure Blob Storage (Amazon Dynamo?) and EventHub (rubbing shoulder with Kafka). Also highly-available RDBM database (SQL Azure), Message Broker (Azure Service Bus) and a consensus implementation (Service Fabric). There is a plenty of Machine Learning as well, and although slowly, Microsoft is picking up on Grid Computing - alliance with Mesosphere and DCOS offering on Azure.

But none of these have been open sourced. True, Amazon does not Open Source its bread-and-butter cloud. But considering AWS has mainly been an IaaS offering while Azure is banking on its PaaS capabilities, making Distributed Computing easy for its predominantly .NET consumers. It feels that Microsoft is saying, you know, let me deal with the really hard stuff, but for sure, I will leave a button in Visual Studio so you could deploy it to Azure.

At points it feels as if, Microsoft as the Lords of the .NET stack fiefdom, having discovered gunpowder, are charging us knights and peasant soldiers to attack with our lances, axes and swords while keeping the gunpowder weapons and its science safely locked for the protection of the castle. .NET community is to a degree contributing to the #dotnetcore while also waiting for the Silver Bullet that #dotnetcore has been promised to be, revolutionising and disrupting the entire stack. But ask yourself, when was the last time that better abstractions and tooling brought about disruption? The knight is dead, gunpowder has changed the horizon yet there seems to be no ears to hear.

Fiefdom of .NET stack
We cannot fault any business entity for keeping its trade secrets. But if the soldiers fall, ultimately the castle will fall too.

In fact, a single company is not able to pull the weight of re-inventing the emerging innovations. While the quantity of technologies emerged from Azure is astounding, quality has not always followed exactly. After complaining to Microsoft on the performance of Azure Table Storage, others finding it too and sometimes abandon the Azure ship completely.

No single company is big enough to do it all by itself. Not even Microsoft.

*       *       *

I remember when we used to make fun of Java and Java developers (uninspiring, slow, eclipse was nightmare). They actually built most of the innovations of the last decade, from Hadoop to Elasticsearch to Storm to Kafka... In fact, looking at the top 100 Java repositories on github (minus Android Java), you find 24 distributed computing projects, 4 machine library repos and 2 languages. On C# you get only 3 with claims to distributed computing: ServiceStack, Orleans and Akka.NET.

But maybe it is fine, we have our jobs and we focus on solving different kinds of problems? Errrm... let's look at some data.

Market share of IIS web server has been halved over the last 6 years - according multiple independent sources [This source confirms the share was >20% in 2010].

IIS share of the market has almost halved in the last 6 years [source]

Now the market share of C# ASP.NET developers are decreasing to half too from tops of 4%:

Job trend for C# ASP.NET developer [source]
And if you do not believe that, see another comparison with other stacks from another source:

Comparing trend of C# (dark blue) and ASP.NET (red) jobs with that of Python (yellow), Scala (green) and nodejs (blue). C# and ASP.NET dropping while the rest growing [source]

OK, that was actually nothing, what I care more is OSS. Open Source revolution in .NET which had a steady growing pace since 2008-2009, almost reached a peak in 2012 with ASP.NET Web API excitement and then grew with a slower pace (almost plateau, visible on 4M chart - see appendix). [by the way, I have had my share of these repos. 7 of those are mine]

OSS C# project creation in Github over the last 6 years (10 stars or more). Growth slowed since 2012 and there is a marked drop after March 2015 probably due to "vNext". [Source of the data: Github]

What is worse is that the data showing with the announcement of #vNext aka #DNX aka #dotnetcore there was a sharp decline in the new OSS C# projects - the community is in a limbo situation waiting for the release - people find it pointless to create OSS projects on the current platform and the future platform is so much in flux which is not stable enough for innovation. With the recent changes announced, practically it will take another 12-18 months for it to stabilise (some might argue 6-12 months, fair enough, take what you like). For me this is the most alarming of all.

So all is lost?

All is never lost. You still find good COBOL or FoxPro developers and since it is a niche market, they are usually paid very well. But the danger is losing relevance…

Practically can Microsoft pull it off? Perhaps. I do not believe it is hopeless, I feel a radical change by taking the steps below, Microsoft could materially reverse the decay:
  1. Your best community brains in the Distributed Computing and Machine Learning are in the F# community, they have already built many OSS projects on both - sadly remaining obscure and used by only few. Support and promote F# not just as a first class language but as THE preferred language of .NET stack (and by the way, wherever I said .NET stack, I meant C# and VB). Ask everyone to gradually move. I don’t know why you have not done it. I think someone somewhere in Redmond does not like it and he/she is your biggest enemy.
  2. Open Source good part of distributed services of Azure. Let the community help you to improve it. Believe me, you are behind the state of the art, frankly no one will look to copy it. Someone will copy from Azure Table Storage and not Cassandra?!
  3. Stop promoting deployment to Azure from Visual Studio with a click of a button making Distributed Computing looking trivial. Tell them the truth, tell them it is hard, tell them so few do succeed hence they need to go back and study, and forever forget about one-button click stuff. You are not doing a favour to them nor to yourself. No one should be acknowledged to deploy anything in distributed fashion without sound knowledge of Distributed Computing. 

Last word

So when I am asked about whether I am optimistic about the future of .NET or on the progress of dotnetcore, I usually keep silent: we seem to be missing the point on where we need to go with .NET - a paradigm shift has been ignored by our ecosystem. True dotnetcore will be released on 27th but  after all, it might not matter as much as we so much care about. One of the reasons we are losing to other stacks is that we are losing our relevance. We do not have all the time in the world. Time is short...


Github Data

Gathering the data from github is possible but due to search results being limited to 1000to rate-limiting, it takes a while to process. The best approach I found was to list repos by update date and keep moving up. I used a python script to gather the data.

It is sensible to use the number of stars as the bar for the quality and importance of Github projects. But choosing the threshold is not easy and also there is usually a lag between creation of a project and it to gain popularity. That is why the threshold has been chosen very low. But if you think the drop in creation of C# projects in Github was due to this lag, think again. Here is the chart of all C# projects regardless of their stars (0 stars and more):

All C# projects in github (0 stars and more) - marked drop in early 2015 and beyond

F# showing healthy growth but the number of projects and stars are much less than that of C#. Hence here we look at the projects with 3 stars and more:

OSS F# projects in Github - 3 stars or more
Projects with 0 stars and more (possible showing people starting picking up and playing with it) is looking very healthy:

All F# projects regardless of stars - steady rise.

Data is available for download: C# here and F# here

My previous predictions

This is actually my second post of this nature. I wrote one 2.5 years ago, raising alarm bells for the lack of innovation in .NET and predicting 4 things that would happen in 5 years (2.5 years from now):
  1. All Data problems will be Big Data problems
  2. Any server-side code that cannot be horizontally scaled is gonna die
  3. Data locality will still be an issue so technologies closer to data will prevail
  4. We need 10x or 100x more data scientists and AI specialists
Judge for yourself...

Deleted section

For the sake of brevity, I had to delete this section but this puts in context how we have many more hyperscale companies:

"In the 2000s, not many had the problem of scale. We had Google, Yahoo and Amazon, and later Facebook and Twitter. These companies had to solve serious computing problems in terms of scalability and availability that on one hand lead to the Big Data innovations and on the other hand made Grid Computing more accessible.

By commoditising the hardware, the Cloud computing allowed companies to experiment with the scale problems and innovate for achieving high availability. The results have been completely re-platformed enterprises (such as Netflix) and emergence of a new breed of hyperscale startups such as LinkedIn, Spotify, Airbnb, Uber, Gilt and Etsy. Rise of companies building software to solve problems related to these architectures such as Hashicorp, Docker, Mesosphere, etc has added another dimension to all this.

And last but not least, is the importance of close relationship between academia and the industry which seems to be happening after a long (and sad) hiatus. This has lead many academy lecturers acting as Chief Scientists, etc to influence the science underlying the disruptive changes.

There was a paradigm shift here. Did you see it?"

Friday, 13 May 2016

XML or JSON, and that is not the question

So in last couple of days, our .NET community has showed some strong reactions to the announcements in the ASP.NET team stand-up. While Ruby and more recently node community are known to endless dramas on arguably petty issues, it felt that .NET community was also capable of throwing tantrums. For those who are outside .NET community or have not caught up with the news, .NET/ASP.NET team have decided to revert the project.json (JSON) in #DotNetCore back to *.csproj/*.vbproj (XML) and resurrect msbuild. So was it petty in the end?

Some believed it was: they argued all that was changed was the format of the project file and the drama associated with it was excessive. They also pointed out that all the goodness of project.json would be ported to the familiar yet different *.csproj. I call this group the loyalists:

On the other hand, some were upset by the return of the msbuild to the story of .NET development. This portion of the community were arguing that +15-year-old msbuild does not have a place in the modern development. They have been celebrating death of this technology not knowing it was never really dead - I call them msbuild-antagnoists. The first group (loyalists), on the other hand, were flagging that the msbuild would be improved and the experience would be modernised.

Now there were another group of people were frustrated that this decision had been made despite the community feedback and solely based on the feedback of “some customers” behind the closed doors. I call them OSS-apologetics and their main issue was the seemingly lack of weight of the community feedback when it comes to the internal decisions that Microsoft takes as a commercial enterprise - especially in the light of the fact that project.json was announced almost 2 years ago and it was very late to change it.

Now there were yet another group that had invested time and effort (==money?) in building projects and tooling (some of which commercial) and they felt that the rug has been pulled from underneath them and all those hours gone to waste - for the lack of a better phrase I call them loss-bearers. And they were even more upset to see that their loss was accounted as a learning process:
Obviously there is not a great answer for them but it is usually said that it is a very minor part of the whole community who have been living on the bleeding edge and knew it could be coming any minute, as mentioned on the stand-up:

Where do I stand?

I stand somewhere in between. Cannot quite agree with the loyalists since it is not just the question of format. On the other hand, I do not bear any losses since I had decided long time ago that I will skip the betas and pick it up when the train of changes slows down - something not yet in sight.

But I do not think any of the above captures the essence of what has been happening recently. I am on the belief that this decision along with the previous disrupting ones have been important and shrewd business decisions to save the day and contain losses for Mircosoft as a commercial platform - and no one can blame Microsoft for doing that.

I had warned times and times again that the huge amount of change in the API and tooling and no easy migration path will result in dividing the community into a niche progressive #DotNetCore minority and the mainstream commercial majority who would stay on .NET Fx and need years (not months) to move on to #DotNetCore - if at all. And this potentially will create a Python-vs-3-like divide in the community.

The cross from the old .NET to the new #DotNetCore (seemingly similar on the surface yet wildly different at heart) would not be dissimilar to the cross between VB6 to .NET. And what makes it worse is that unlike then, there are many viable alternatives OSS stacks (back then there was only Java and C/C++). This could have meant that the mainstream majority might in fact decide to try an altogether different platform.

So Microsoft as a business entity had to step in and albeit late, fix the few yet key mistakes made at the start and alongside the project during the last 2 years:
  • ASP.NET team to make platform/language decisions and implement features with clever tricks rather the .NET Fx baking such features in the framework itself. An example was Assembly Neutral Interfaces.
  • Ignoring the importance upgrade path for the existing projects and customers
  • Inconsistent, confusing and ever changing layering of the stack
  • Poor and conflicting scheduling messages
  • Using Silverlight’s CoreCLR for ASP.NET resulting in dichotomy of the runtime, something that as far as I know has no parallel in any other language/platform. In the most recent slides I do not see CoreCLR being mentioned anymore yet it might be there. If it is, it will stay a technical debt to be paid later.
All in all it has been a rough ride both for the drivers and the passengers of this journey but I feel now the clarity and cohesion is back and long-standing issues have been addressed now.

Where could I be wrong?

My argument naturally brings these counterarguments:
  • Perhaps had ASP.NET team not pushed the envelope this far by single-handedly crusading to bring modern ideas and courageous undertakings such as cross-platform, we would be having .NET 5 now instead of #DotNetCore.
  • By carrying baggage from the past (msbuild), Microsoft is extending the lifespan its stacks which in the short term will be beneficial to the corporate but since it is not a clean break, in the long term results in dispersion of the community and a need for another redeux.
Hard to answer these arguments since one is a hypothetical situation and the other looks well into the uncertainty of the future. I will leave it to the readers to weigh the arguments.

Last word

It is not possible to hide that none of this has been without casualties. Some confidence lost, community at times upset and overall has not been all rosy as far as the Microsoft’s image in its OSS ventures goes. I did mention old and new Microsoft coming head-to-head, which might not be correct but as Satya Nadella said, culture does not change overnight.

Monday, 8 February 2016

Future of Information: the Good, Bad and Ugly of it

We are certainly at the cusp of a big revolution in the human civilisation - caused by the Information Technology and Machine Intelligence. There are golden moments in the history that have fundamentally changed us: late dark ages for the Astronomy, early renaissance for Physics, 1700-1800s for the Chemistry, late 1800s for the Microbiology, 1950s for the transistors… and the periods get more and more compressed. It looks like a labyrinth where it gets narrower when you get closer to its centre.

Without speculating on what the centre could look like, and considering this could be still a flat line of constant progress, we need to start thinking what the future could look like  - not because it is fun, but because an action could be warranted now. There is no shortage of speculation or commentary, one man can dream, and fathom a far future which might or might not be close to the distant reality. And that is not the point. The point is, as I will outline below, it could be getting late to do what we need to DO. Yes, this is not a sci-fi story…

On one hand, there is nothing new under the sun, and the cycle of change has always been with the mankind since the beginning. We always had the reluctant establishment fighting with the wind of change promoted by the new generation.

Figure 1 - Accelerating change [Source: Wikipedia]

On the other hand, this is the first time in the history that the cycle of change has been reduced to less than a generation (a generation is normally considered 20-25 years). You see, the politicians of the past had time to grow up with the changes, feel the needs, brew new ideas and come up with the solutions. Likewise, the nations have had the time to assimilate and react to the changes in terms of aspirations, occupation and direction as the new changes would not be fully in effect during the one person’s lifetime. What about now? Only a decade ago (pre-iPhone) looks like a century backwards. The cycle of change already looks like to be around 5-10 years [see Figure 1]. And look at our politicians: it is not a coincidence someone like Trump can capture the imagination of a nation in the lack of visionary contenders. The politics as we know it has reached the end of life - IMHO due to lack of serious left-wing ideas - but that is not the topic of this post. The point I am trying to make is politicians are no longer able to propose but the most trivial changes since their view of the world is limited by their lack of understanding of a whole new virtual world being created alongside this physical world whose rules do not exist in any books.

And it is not just the politics that is dropping far behind. Economics in the face of fast cycle of change will be different too. First of all, today’s financial malaise experienced in many developed countries might still be around for years to come. In an age of Keynesian economy and central intervention characterised by low inflation, low growth and abundance of money printed by central banks, it seems the banks are no longer relevant. Current economy sometimes referred to as the Japanisation, which was spotted back in 2011 and 5 year on feels no different. And it is no coincidence that an IMF report finds decreasing efficiency of capital in Japanese Economy - that can be applied elsewhere. Looking at the value of bank stocks provides the glaring fact that they are remnants of institutions from the past. True, they are probably still financing mine and your mortgage but their importance as the cornerstone of development during the previous centuries is gone. Why? Because the importance of capital in a world where there is so much of it around without finding a suitable investment is overrated. With 10-yr US Bonds at around 1.8% and yield on 2-yr German bund at -0.5% (!), an investment with 2% annual return is a bargain. In fact today’s banking is characterised by piling up losses year on year (for example this and this). Looking at the Citigroup or Bank of America’s 10 year chart is another witness to the same decline. In an environment when money is cheap (Because of ZIRP), it cannot be the main driver in the business, as money (and hence banks) is not the scarce commodity anymore. See? We did not even have to mention bitcoin, blockchain or crowdfunding.

Figure 2 - Deutsche Bank Stock since year 2000 [Source: Yahoo Finance]
But beyond our myopic view of the economy focused on the current climate, there is a rhetoric looking at it from a different angle and far into the future, seeing the same pattern. In one interesting essay on Economics of the future, authors find an ever decreasing role for the capital. While mentioning the importance of the suitable labour (in terms of geek labour force, currently the scarcest resource resulting in companies not growing their full potential) could be helpful, it is evident that capital is no more an issue.

In essence what all this means is, if historically the banks as the institutions controlling capital had the upper hand, in the days to come it will be those controlling Information. The future of our civilisation will be surrounding the conflicts to control the Information, on one hand by the state, on the other by the institutions and finally by us and NGOs for the privacy.

The Good

Rate of data acquisition has been described as exponential. This has been mainly with regard to the virtual world and our surroundings but very soon it will be us. From our exact whereabouts to our blood pressure to various hormonal levels and perhaps our emotional status, all will be around very soon. A lot of this is already possible, also known as quantifying self. But it is only a matter of time for this to be for everyone.

It is not difficult to think what it can do to promote health and disease prevention. Even now those who suffer from heart arrhythmia carry devices that can defibrillate their heart if a deadly ventricular fibrillation occurs. The blood pressure, sugar level, various hormonal levels, and all sorts of measurable elements can be tracked. Cancerous cells can be detected in blood (and its source identified) well before it could grow and spread. The plaques in the blood vessels will be identified by the micro devices circulating and any serious stenosis can be identified. Clots in the heart or brain vessels (resulting in stroke) can be detected at the time of formation with a device releasing thrombolytic agents immediately alleviating the problem. Going for extra medical diagnosis could be very similar to how our cars are being serviced today: a device gets connected to the myriad of micro devices in your body and full picture of health status will be immediately visible to the medical staff. You could be walking on a road or in a car, witnessing a rare yet horrific accident (would there be accidents?) and the medical team would know whether you would suffer from PTSD and whether you would need certain therapies for it - they would know where you were and whether you witnessed the incident from your various measurements.

And of course, this is only the medical side. The way we work, entertain ourselves and interact with the outside world will be completely different. It is not very hard to imagine what it will be like: one cheesy way is to just take everything that you do at home and think of adding an automation/scheduling/verbal command to it. From making coffee, to checking information, to entertainment. But I will refrain from limiting your view by my imagination. What it is clear is that the presence and impact of the virtual world will be much more pronounced.

At the end of the day, it is all about the extra information coupled with machine intelligence.

The Bad

This section is not speculating on what it could look like. We can all go and read any of the dystopian books, many to choose from and could be like any or none.

But instead it is about simple reasoning: taking what we know, projecting the rate of change and looking at what we might get. It is very reasonable to think that machine intelligence will be at a point where it can reason very efficiently with a pretty good rate of success. And on the other hand, it is reasonable to think that there will be many many data points for every person. If we as humans can be represented as intelligent machines that turn data plus our characters into decisions, it is not silly to think that if our characters (historical data) known to the machines and the input perceived by us already available via the many many agents present in and around us, it is not unreasonable to think that the systems can estimate your decisions. So when you think of advertising, this gets really frightening since you would know pretty well what the reaction will be if you have enough information. And it is about, how much, how much money do you have to decide on…

You see, the fight for your disposable income (that part of your income that you can choose how to spend) could not be more fierce: it can make or break companies in the future. The future of advertising and the fight for this disposable income is what makes Eric Schmidt to come out and almost say there won’t be online privacy in the future:
"Some governments will consider it too risky to have thousands of anonymous, untraceable and unverified citizens - hidden people - they'll want to know who is associated with each online account... Within search results, information tied to verified online profiles will be ranked higher than content without... even the most fascinating content, if tied to an anonymous profile simply won't be seen because of its excessive low ranking." - The New Digital Age / page 33
And when you see how the top four companies have already moved into media industry, you get it. Your iPhone selects a handful of news items for you to see, Facebook controls your timeline, Amazon is a full-blown media company and Google controls youtube which has overtaken conventional media for the entertainment of the millennials. We must reiterate that none of these companies are by nature evil but when it is to choose between you and their income, it is natural that they will pick the latter. And guess what: they have what the state wants too.

Let’s revisit banking for a moment to clarify the point. Banks have what politicians need: capital to fund ever more expensive political campaigns. And the state has what the banks need: regulation or rather de-regulation which banks thought will help them prosper because they can enter the stock market’s casino with the high street bank deposits (which ironically has been the source of their losses). And above all, the state catches the banks if they fall, as it did in 2008. ECB uses its various funds (EFSF, ESM, etc) to keep the banks in Greece and Italy (and others, soon Germany?) afloat. And catches the stock market when they fall as it has constantly done by various QE measures, interest rate cuts, printing money, etc. In such a financial milieu, where there are cushions all along the path, there is no real risk anymore leading to irresponsible behaviour by the banks. And the party should never end, no wonder Obama could not move an inch towards bringing back some regulation. Heads of state’s financial institutions come from ex-CEOs of the likes of Goldman Sachs. This alliance of the state and banking has contributed to the growth of inequality (ultimately leading to modern slavery) and no wonder, state is not bothered, the state is made up of politicians in alliance with bankers and the bankers.

And what does it have to do with the future of information? Exactly the same thing can happen in the future, only with the state and heads of companies owning the information. If capital no longer holds the power and it is the information, then the alliance of state and info bosses will lead to the modern slavery. States control the legislations and information companies own private data and control the media: each one having what the other needs. 

The Ugly

Why ugly? Because we are already there, almost. First of all, the states have started gathering and controlling information. NSA is just an example. The states have started requesting companies owning the data to provide them. Legislations are under way to prevent effective encryption. This could all look harmless when we are busy checking out our twitter and facebook timelines, but I have already started to freak me out: companies are already started thinking and acting in this area. As we visited, Google's Eric Schmidt is portraying a future where anonymity has little value, either you agree or otherwise you need to speak out.

Going back to the politics, we do not have politicians or lawyers that have a correct understanding of the technology and its implications, and it is not their fault: they were not prepared for it. But soon, very soon, we will have heads of companies turning politicians. Very much like CEOs of the Goldman Sachs, and I do not mean it necessarily in a bad way, why? Because the power will be in the hands of the geeks and by the same token, we need strong oppositions, we need politicians among us to rise to the occasion and lead us safely into the future where we have meaningful legislations protecting our privacy while allowing safe data sharing. Problem is, we had 2500 years or so to think about democracy and government in the physical world (from the Greek and Roman philosophers to now) but we are confronted with a virtual world where the ethics and philosophy are not well-defined and do not quite map to the physical world we live in yet every lawmaker is trying to shoehorn it to the only thing they know about. Enough is enough.

But where do we start from? My point in this post/essay has been to ask the questions, I do not claim to have the answers. We have not yet explored the problems well enough to come up with the right answers... we need the help of think tanks, many of which I see rising amongst us.

We are surrounded by the questions whose answers (like all other aspects of our industry) tangled with so much of our personal opinions. When it comes to the court of law what doesn't matter is your or my opinion. Is Edwards Snowden a hero or a traitor? Was Julian Assange a visionary or an opportunist? What is ethical hacking, and how is it different from unethical, in fact could hacking ever be legitimate? Is Anonymous a bunch of criminals or a collection of selfless vigilantes working for the betterment of the virtual world in lack of a legal alternative? What is the right to privacy, and is there a right to be anonymous?

Needless to say, there could be some quick wins. I think defining privacy and data sharing is one of the key elements. One improvement could be turning small-print legal mumbo jumbo of the terms and conditions to bullet-wise fact sheets. Similar to “Key Fact Sheet” for mortgages where the APR and various fees are clearly defined, we can enforce a privacy fact sheet where the answers to questions such as “My anonymised/non-anonymised data might/might not be sold”, “I can ask for my records can be physically erased”, “My personal information can/cannot be given to third parties”, etc are clearly defined for non-technical consumers, as well as most of the rest of us who rarely read the terms and conditions.

Whatever the solutions, we need to start… now! And it could be already late.

Tuesday, 24 November 2015

Interactive DataViz: Rock albums by the genre since 1960

Interactive DataViz here:
Last week I presented a talk in #BuildStuffLT titled “From Power Chords to the Power of Models” which was a study of the Rock Music by the way of Data Mining, Mathematical Modelling and also Machine Learning. It is such a fun subject to explore, especially for me that Rock Music has been one of my passions since I was a kid.

The slides from the talk is available and the videos will be available soon (although my performance during the talk was suboptimal due to lack of sleep, a problem which seems to be shared by many at the event). BuildStuffLT is a great event, highly recommended if you have never been to. It is a software conference with known speakers such as Michael Feathers, Randy Shoup, Venkat Subramaniam, Pieter Hintjens and this year was the host of Melvin Conway (yeah, the visionary who came up with Conway’s law in 1968) with really mind stimulating talks. You also get a variety of other speakers with very interesting talks.

I will be presenting my talk in CodeMash 2016 so I cannot share all of the material yet but I think this interactive DataViz alone is many many slides in a single representation. I can see myself spending hours just looking at the trends and artist names and their album covers - yeah this is how much I love Rock Music and its history - but even for others this could be fun and also help you discover some new to listen to.


This is an interactive percentage-based stacked area chart of top 10 genres in a year, since 1960, where Rock Music as we know it started to appear. That is a mouthful but basically for every year, top 10 genres selected so the dataset contains only those Rock (or related) genres that at some point were among the top 10 genres. You can access it here or simply clone GitHub repo (see below) and host your own.

The data was collected from Wikipedia by capturing Rock Albums and then processing their genres, finding top 10 in every year and then presenting in a chart - I am using Highcharts which is really powerful and simple to use and has a non-commercial license too. The data itself I have shared so you can run your own DataViz if you want to. The license for the data is of course Wikipedia’s, which covers these purposes.

I highly recommend you start with the Visualisation with “All Unselected” (Figure 2) and then select a genre and visualise its rise and fall in the history.

Then you can click on a point (year/genre) to list all albums of that genre for that year (Figure 3). Please note that even when the chart shows 0%, there could be some albums for that genre - which are from a year which that genre was not among the top 10 genres.

Looking at the data in a different way

Here is the 50 years of Rock (starting from 1965) with the selected albums:

Things to bear in mind

  • The data has been captured by capturing all albums for all links found in documents that traversed from the list of rock genres then to the artist pages. As far as I know, the list includes all albums by the major (and minor) rock artists - according to Wikipedia. If you find a missing album (or artist), please let me know.
  • Every album will contribute all its genres to the list. This means if it has genres “Blues Rock” and “Rock”, then it will be counted once for each of the its genres and you can find it if you look at both Rock or Blues Rock genres.
  • Data has some oddities, sometimes an album occurs more than once, mainly due to nuances of data in Wikipedia, there are multiple entries (URLs) for the same document, etc. Data has already been cleansed through many processes and these oddities do not materially change the results. In the future however, there are things that can be done remove these remaining oddities.
  • Again, it is highly recommended that you click the “Unselect All” button and click on the genres that you are interested one by one and explore the name of the albums.
  • Clicking “Select All” or “Unselect All” takes a bit too much time. I am sure it has an easy solution (turn rendering off when changing the state) but have not been able to find it. Expect your PRs!
  • There are some genres in the list which are not really Rock genres. These genres would have been mentioned alongside a rock genre in the album cover or had been a not-so-much-rock album by an otherwise Rock artist.

Code and Data

All code and data published in GitHub. Code uses Highchartsjs, knockoutjs and foundations UI framework. Have fun!