Jump to content
IGNORED

CrowdStrike update causes major IT outage, taking out banks, airlines and businesses globally


Recommended Posts

Posted

32 bit systems affected only....64 bit ok apparently.

Posted
16 minutes ago, petermik said:

32 bit systems affected only....64 bit ok apparently.

Would be very surprised if this was true. Got a link?

Posted

One C++ programmer is saying that the crash was caused by the Crowdstrike update trying to access a memory address that it wasn't supposed to so Windows just said "fuck it, I'm gonna crash now." Beyond my level of understanding, but how did Crowdstrike not catch something like this in testing before pushing an update to thousands of customers around the world? 🤨

Thread by @Perpetualmaniac on Thread Reader App – Thread Reader App

  • Like 4

I suffer from PTSD, Post Thailand Stress Disorder =P

Posted
7 minutes ago, tokyojoe2010 said:

One C++ programmer is saying that the crash was caused by the Crowdstrike update trying to access a memory address that it wasn't supposed to so Windows just said "fuck it, I'm gonna crash now." Beyond my level of understanding, but how did Crowdstrike not catch something like this in testing before pushing an update to thousands of customers around the world? 🤨

Thread by @Perpetualmaniac on Thread Reader App – Thread Reader App

Another question is, why don't they have a beta group of smaller number of volunteer customers to push an update out to first.

  • Like 3
Posted

Hopefully we wil get answers to these questions in the coming days. What has happened beggars belief.

Is this just further proof of the "competency crisis" the world is now suffering from?

Interesting times ahead, for sure.

  • Like 1
Posted

Local reaction to CrowdStrike computer outage: “Somebody dropped the ball”
...
A faulty code in the update files resulted in one of the most widespread tech outages in recent years for companies using Microsoft’s Windows operating system. The outage resulted in Microsoft users computers display what it commonly referred to as the “blue screen of death.”

“It’s being fixed now and it will be cleared up. But, it certainly wreaked havoc while it was happening,” Johnson said.

CrowdStrike provides antivirus software to Microsoft for use in Windows devices. Johnson the company was attempting to perform what is usually routine upgrade to its security system.

“(CrowdStrike) pushed out an update that had a glitch in it and the glitch actually kept the computer from booting up Windows,” Johnson said. “If your company used CrowdStrike and your computer used Microsoft, it caused the ‘blue screen of death’ to come up.”

Johnson said computer systems using Mac or Linix were not affected.

“It’s a CrowdStrike problem. It’s up to CrowdStrike to provide the fix,”

Johnson says it could be weeks before it’s known how the security upgrade became defected.

“Typically before you push out an update, or a patch as it’s called in the industry, it undergoes beta testing, so they test the heck out it,” Johnson said.

“Somebody dropped the ball and let this bad file get out.”
...

Posted
14 minutes ago, tokyojoe2010 said:

how did Crowdstrike not catch something like this in testing before pushing an update to thousands of customers around the world? 🤨

Devops is the way the world works now, developers write code, check it in to a code repository and automation is supposed to take care of the testing and deployment.  Streamline the process enough and there's very little human intervention after the code is written.  

If the automated test cases are poorly written, or lack coverage of all possible failure scenarios then stuff gets through into production and can cause havoc.

Facebook, Google, Amazon instituted the "move fast and break things" approach that encouraged Devops to get to this level, it works fine for them because they don't have to be 100% accurate all the time and have redundancy upon redundancy if things go wrong.

In the old days there were Quality Assurance teams that would thoroughly test each release before it got let out to customers, those are long gone in the Devops era.

Posted
1 hour ago, Love_to_eat_Thai said:

Devops is the way the world works now, developers write code, check it in to a code repository and automation is supposed to take care of the testing and deployment.  Streamline the process enough and there's very little human intervention after the code is written.  

If the automated test cases are poorly written, or lack coverage of all possible failure scenarios then stuff gets through into production and can cause havoc.

Facebook, Google, Amazon instituted the "move fast and break things" approach that encouraged Devops to get to this level, it works fine for them because they don't have to be 100% accurate all the time and have redundancy upon redundancy if things go wrong.

In the old days there were Quality Assurance teams that would thoroughly test each release before it got let out to customers, those are long gone in the Devops era.

Given the millions of computers affected by the coding error, I would hazard a guess that all they would have had to do was apply the patch to one sandbox system in house and they would have caught the error.

Posted

Businesses really need to think about next level BCPs. Fall back measures, all costs money for an occurrence that might happen in a blue moon.  

Posted (edited)
3 hours ago, Love_to_eat_Thai said:

If the automated test cases are poorly written, or lack coverage of all possible failure scenarios then stuff gets through into production and can cause havoc.

I just had a crazy thought, I don't own a tin foil hat and I'm not partial to entertain conspiracy theories, but what if it could have been some kind of supply chain attack.

A supply chain attack is where someone uses an outside provider or vendor to gain access to data or systems, supposedly to steal information or install malware of some kind, but what if the goal was not to steal data, but to cause havoc.

Ironically crowdstrike have released information about this themselves

https://www.crowdstrike.com/cybersecurity-101/cyberattacks/supply-chain-attacks/

3 hours ago, forcebwithu said:

“Somebody dropped the ball and let this bad file get out.”

A disgruntled employee or a "bad guy" or some sort might have caused the fault to be included in the update, in scenarios where it usually would not have gotten that far.

It seems awfully convenient to say that it was just a bug and someone fucked up and let it creep in.

1 hour ago, Yogi007 said:

Businesses really need to think about next level BCPs. Fall back measures, all costs money for an occurrence that might happen in a blue moon.  

This is a very important point. Certain things you just can't do if the computer wont turn on, but there are things that could be done without computers, that were actually done without computers once upon a time, so having a plan in place for when the computers don't work is a realistic and worthy endeavour if you ask me.

 

Edited by nourishing lotion
Posted

It wasn't a supply chain attack, they pushed a kernel update (can't really say "upgrade") that was faulty and admitted it was their action.

Posted
18 hours ago, Harry Brown said:

They can bandaid it to stop the patch loading but then that leaves them vulnerable to hacking.

 

I had a look on flight radar and planes seems to be landing and taking off in BKK

Here in Perth a couple of delays but mostly looks normal now.

 

They are working on a fix.

 

 

Yeah, looks like it will take a while. It will be interesting to see final direction companies and countries will take in terms of RULES and regulations with such high level of concentration risk. 

Posted

some stuff about it being the company that erased or setup the secret servers for Hilary Clinton's incriminating emails too.

Posted

Clarke airport was totally messed up yesterday, presume the others in Phil were in a similar state.

Posted (edited)
3 hours ago, Yogi007 said:

Businesses really need to think about next level BCPs. Fall back measures, all costs money for an occurrence that might happen in a blue moon.  

It's not an easy one as companies are so reliant on Crowstrike and it needs to install device drivers at a lower more privileged level of an operating system to function.

The fix is manual until they release a fix. If the patch could have been rolled back there and then that would have prevented all the downtime and blue screens, but unsure how easy that would be given the error.

But critical services definitely need some kind of BCP in place for this type of event

 

Edited by googlehead
Posted (edited)
1 hour ago, ricktoronto said:

It wasn't a supply chain attack, they pushed a kernel update (can't really say "upgrade") that was faulty and admitted it was their action.

I was thinking more along the lines of a supply chain attack against cloudstrike that allowed a faulty kernal update to get pushed out by cloudstrike.

Maybe I'm mistaken about what a supply chain attack is.

My point is that it's convenient to identify it as a mistake of some sort, since if it was a successful attack or deliberate in any way, the consequences for the company would be more severe.

Edited by nourishing lotion
Posted
Quote

Around 8.5 million devices — less than 1 percent Windows machines globally — were affected by the recent CrowdStrike outage, according to a Microsoft blog post by David Weston, the company’s vice president of enterprise and OS security.

https://techcrunch.com/2024/07/20/microsoft-says-8-5m-windows-devices-were-affected-by-crowdstrike-outage/

I suffer from PTSD, Post Thailand Stress Disorder =P

Posted

Well they seemed to ignore the basic rule of updates - don't push on update out on a Friday - less people normally working on a Friday if there are problems and lots less the next couple of days. There's a reason Microsoft instituted 'Patch Tuesday'

  • Like 1
Posted
4 hours ago, TravellingBrit said:

Well they seemed to ignore the basic rule of updates - don't push on update out on a Friday - less people normally working on a Friday if there are problems and lots less the next couple of days. There's a reason Microsoft instituted 'Patch Tuesday'

Worse than pushing an update out on a Friday is they pushed it out to all 8.5 million computers instead of a staggered release to a small subset first.

Just watched this vid that came up in my YT recommend list. Thought he did a good job of explaining the reason behind the outage.

This comment from the vid highlights the insanity of what CrowdStrike did by doing a global release of the patch instead of a staged release.
image.png

 

  • Like 3
Posted
21 minutes ago, forcebwithu said:

Worse than pushing an update out on a Friday is they pushed it out to all 8.5 million computers instead of a staggered release to a small subset first.

Just watched this vid that came up in my YT recommend list. Thought he did a good job of explaining the reason behind the outage.

This comment from the vid highlights the insanity of what CrowdStrike did by doing a global release of the patch instead of a staged release.
image.png

 

That was a good synopsis. And emjoyable to watch too. Was interesting to learn how the CS driver is made in such a way that it must be loaded at boot start time (11.35). I guess MS thought that the risk with this was minimal 🙂

 

  • Like 1
Posted (edited)
2 hours ago, tsec said:

That was a good synopsis. And emjoyable to watch too. Was interesting to learn how the CS driver is made in such a way that it must be loaded at boot start time (11.35). I guess MS thought that the risk with this was minimal 🙂

 

I agree, it was good synopsis.  Some actual details of the Windows boot process and Kernel mode.

I am still unclear as to why a kernel mode bug forces an almost complete stop of the machine.

Its a choice made by the Windows designers probably to prevent worst things happening. OK.  But the processor is still working and executes lots of code in order to change the screen colour to blue, list out the contents of registers and show all sorts of info.

So why cant the blue screen code do something even more useful in this situation ?  Such as list out all the kernel mode files that have changed since the last successful boot.  With option to rename or delete them.  Or show the driver that was in loading phase at time of the BSOD, with option to disable loading on reboot.

All sorts of more useful options could be envisaged, rather than just a BSOD.

Also the Windows driver quality assurance certification needs reviewing by MS since it failed to achieve its goal.  MS are fully aware that kernel mode code is a risk and needs a reliable control process, but allowed a 3rd party to get through their certification process 

Edited by rog555
Posted
1 minute ago, rog555 said:

I agree, it was good synopsis.  Some actual details of the Windows boot process and Kernel mode.

I am still unclear as to why a kernel mode bug forces an almost complete stop of the machine.

Its a choice made by the Windows designers probably to prevent worst things happening. OK.  But the processor is still working and executes lots of code in order to change the screen colour to blue, list out the contents of registers and show all sorts of info.

So why cant the blue screen code do something even more useful in this situation ?  Such as list out all the kernel mode files that have changed since the last successful boot.  With option to rename or delete them.  Or show the driver that was in loading phase at time of the BSOD, with option to disable loading on reboot.

All sorts of more useful options could be envisaged, rather than just a BSOD.

Also the Windows driver quality assurance certification needs reviewing by MS since it failed to achieve its goal.

He did say that if something shits itself at kernel level, then the whole thing shits itself and at a minimum, a reboot is required (watch out for bitlocker 😁 ).

But the God-like access this thing needed.. well.. check this....

 

 

1721621751324506.png

  • Like 2
Posted
Quote

CrowdStrike has published a post incident review (PIR) of the buggy update it published that took down 8.5 million Windows machines last week. The detailed post blames a bug in test software for not properly validating the content update that was pushed out to millions of machines on Friday. CrowdStrike is promising to more thoroughly test its content updates, improve its error handling, and implement a staggered deployment to avoid a repeat of this disaster.

https://www.theverge.com/2024/7/24/24205020/crowdstrike-test-software-bug-windows-bsod-issue

  • Like 1

I suffer from PTSD, Post Thailand Stress Disorder =P

Posted (edited)

The image posted by @tsec above mentioned a recovery tool published by Microsoft, which is available here:  https://go.microsoft.com/fwlink/?linkid=2280386

Basically, you boot your computer using recovery media, e.g. Windows PE on a USB drive, then use that tool, which I guess deletes the faulty driver file.

The reason I point this out is that, in this day and age, it makes the most sense IMHO to either have a bootable Windows recovery environment available like Windows PE, or back up your C: drive onto a USB drive as a bootable disk image.  Windows machines are easy to boot from USB disks, and storage is basically free.

Personally I have 2 tools I use for this: 

To create the bootable copy of my C: drive I use EaseUS Partition Master.  This is an intuitively easy tool you can use to create bootable images of hard drives, as well as to create and resize Windows disk partitions.  My bootable backup is a 1TB SSD thumb drive (yep that's right!), so it's easy to carry when I take my lapper on holiday.  This software is available as a free download, but I paid for the full license.

To create up-to-date image backups of my C: and D: drives (I have some applications installed to run off the much-larger D: drive), I use Macrium Reflect.  I have this set up to make a full mage of my C: and D: drives one night per week, and incremental backups nightly.  BTW, Macrium has a free business license that I also use at work, although I bought the full license for home.

So, basically the scheme is that if my C: drive takes a shit, I can boot off the bootable USB stick and be right back in business.  I tend not to install software too often, so booting off the USB stick shouldn't cause me to lose much if any capability.

I recommend a scheme like this for anyone who uses a laptop of any sort, including non-Windows machines.  If you don't have the chops to set it up yourself, get someone else to do it, but make sure he gives you some documentation on what he did, in case you have to have someone else help you get running after a failure.

PS:  I mentioned this elsewhere, but I have a compact PC I use as my media streamer as well.  I back this up less frequently, but in case something really ridiculous happens to my lapper I can switch to using the media PC as my "everyday driver" until it's fixed.

Edited by Bruce Mangosteen
  • Like 2
Posted (edited)

Just to add to what the gentleman very nicely articulated. Apple does not  allow 3rd parties to access its kernel. Apple protects its OS like the crown jewels and knows it importance Their machines are more expensive but note that they don’t charge a penny for OS upgrade unlike MS  and they handle all the security patches. 

In 2008 I got so fed with MS OS devices (all 3rd party brands as MS does not build its own devices just the OS ) been hit repeatedly with various viruses, malware etc of the extended family that I replaced every single laptop and PC with Apple. None of the anti-virus software even expensive ones can stop a zero day attack. 

It has been 15 years and not single incident. 

People behind various malwares compromises don’t waste their time with Apple as they have a massive target with MS OS, so why bother. .

So in their end, the very software (Crowdstrike’s Falcon) that was supposed to protect the MS OS from malware ended up causing an incident that was far worse that any malware attack. What an irony. 

I blame Bill as it was under his watch that MS chose not to protect its OS and asked consumers to purchase 3rd party anti virus kits in promotional tie-ups.  It took years before they finally introduced Windows Defender. 

If you have vulnerable people in your family who do online banking, tell them to switch to Apple.

 

 

 

Edited by Joeleg

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.