As some of you know, ipban pro had an outage on 2022-08-04. This lasted a good 8 hours.
I have been using Azure Functions as a proxy layer for all of ipban pro. At a basic level, Azure functions knows about all the other services and data stores necessary to handle orders, licenses, api keys and the recent and naughty list. For the last several years it has been running pretty smoothly.
Yesterday I was ddos attacked for about an hour at 7:30am UTC. No code deploys or changes the past few weeks.
During and after this attack, Azure functions decided to melt down and return 503 codes or timeouts for all calls. I believe this was a bug in Azure functions not releasing connections. My metrics showed 600 simultaneous connections and then suddenly zero every few minutes. Clearly Azure didn’t like that and decided to kill my app. Even stopping it for a few minutes and starting it again failed to fix things.
I had considered just deleting the function app and re-deploying it, but in my mind the root problem needed to be discovered and addressed.
I spent a few hours poking around application insights and the troubleshooting section. There were references to this connection limit. But I even shut down my entire layer of services and still received the 503 and/or timeouts when calling the function app directly. So it wasn’t because of traffic from my machines. I also have functions locked down with ip address restrictions and app keys.
Azure charged me 29$ to just be able to send an email to someone. By the time they responded four hours later, I had re-written my function app into a regular server app and hosted it myself. Azure still hasn’t provided a reason as to why the function app went belly up.
This is the first time I have had a cloud provider completely fail on me. I won’t be using Azure functions again. This also left a very sour taste in my mouth for Azure itself. The other services Azure provides have been working well for me but I will definitely be keeping a backup plan in my back pocket in case some of the other services have similar problems.