cyber attack http://10.10.10.29 Fri, 19 May 2017 12:57:32 +0000 en-US hourly 1 https://wordpress.org/?v=4.6.1 Detection of Privilege Elevation by Malware on Linux http://10.10.10.29/detection-of-privilege-elevation-by-malware-on-linux/ http://10.10.10.29/detection-of-privilege-elevation-by-malware-on-linux/#respond Tue, 04 Apr 2017 01:08:53 +0000 http://10.10.10.29/?p=2516 Continue reading Detection of Privilege Elevation by Malware on Linux ]]> One of the hallmarks of targeted cyber attacks is to seek, from an execution toehold on a host, to increase its computational privileges in order to assert greater control of the system. Once the attacker has attained this position, it may become tremendously difficult to detect them, especially if they act and persist through a kernel rootkit. Fortunately, the privilege elevation process tends to be noisy, and can be detected prior to succeeding, if one looks for the proper clues. This article presents detection heuristics for privilege elevation on Linux systems.

There are multiple ways for a program to seek privilege elevation, but they all are based on one simple principle: high privileges are granted by other high privilege programs. Such programs can be classified across two categories:

  1. the operating system kernel;
  2. highly privileged user-mode programs.

The signs of exploitation differ depending on whether the attacker is exploiting the kernel or a user-mode program. Let’s start with the latter.

Exploiting high-privilege user-mode programs

There are basically two categories of exploits at play here:

  1. forking off of a running highly privileged program;
  2. starting a highly privileged program from a lowly privileged program.

Case (B) is the easiest to monitor for malware detection. The typical scenario considered here is that the program just started was a set-UID or set-GID program. Any process under UNIX is running as a certain user and group, and its set of privileges derive from them. The usual privilege transfer protocol stipulates that the user and group of a process are inherited from the parent process that forks it off; the child process then executes a new program, which does not alter this set of privileges. However, set-UID and set-GID executables change this protocol: after the child process has been forked off, when it executes this program, the user (set-UID) or group (set-GID) it runs as becomes that which owns the program. Therefore, if the executable is owned by a user with a high level of privilege — typically the root user — the process runs with high privileges.

Many common UNIX programs, such as su and sudo, necessarily leverage this privilege transfer rule. However, many systems administrators tolerate other set-UID or set-GID programs in order to facilitate the execution or delegation of certain tasks. This is a dangerous vulnerability, especially if the set-UID/GID programs are relatively complex. Indeed, many hackers know how to supply parameter sets to such programs in order to trick them to run arbitrary shell commands.

Case (A) is quite a bit harder to monitor. In this case, the exploit consists of submitting specially crafted input to a highly privileged program so as to either take advantage of a design flaw, or to exploit a vulnerable bug. In both cases, the end result is that the attacker can trick the program into forking a process from which the exploit provides means to run a chosen program. The child process from the forking operation naturally inherits its parent’s high privileges. This case is harder to monitor, because many service daemons run with a high level of privilege, and naturally spawn off legitimate highly privileged child processes. A good example is the sshd daemon, which handles SSH connections to the machine: whenever the root user is permitted to log on the machine through SSH, sshd is legitimately able to fork off root-privileged interactive shells and commands.

In both cases, the solution for detecting malware activity is to raise an alert whenever a privileged process spawns off a child process with lower or equal privileges. We can consider privileged any process that runs with a UID smaller or equal to N: on many servers, N can be 0 (root); certain specialized Linux systems (e.g. Android) attribute important privileges to non-root users up to UID = 9999. This effectively captures any set-UID program execution (we whitelist the common legitimate set-UID programs su and sudo), as well as any privileged process spawned from a privileged daemon. We obviously advise IT administrators in customer organizations to minimize the number of uncommon set-UID programs and “root-spawns-root” processes (which result in false positive alerts). Our strategy of choice is to locally whitelist the latter processes that the customer insists on deploying on his servers.

Exploiting the operating system kernel

While many hacks leverage vulnerabilities of user-mode processes, others attack kernel bugs to get execution therein. Since user-mode processes are a much easier place to implement common computing tasks, such as TCP/IP communications, kernel exploits often look to raise the privileges of a shell (or other program) already being run by the attacker. In the Linux kernel, this is rather simple, as the address to the list of task structures is made public to all user mode processes. Once simply has to walk this list and set the user and group of the attacker’s shell to 0 (root).

However, such an exploit breaks an important rule of the lifecycle of processes: a process can set its own UID in order to lower its privileges, but never to raise them. By setting a process’ UID or GID to a lower value than it already had, the attacker poses an uncommon gesture that is easy to look for. In this case, SNOW keeps a running snapshot of all processes and the user and group each is running as. It updates this snapshot at randomized time intervals, and when it does, it compares with the previous snapshot: if any of them bears a lower UID than it did previously, this raises an alert. Since the lowering of a process’ UID is not a legitimate metadata shift, this detection heuristic generates no false positive alert.

Summary

In essence, looking for privilege elevation requires tracking two things on a system:

  1. the apparition of any child process with UID/GID <= N;
  2. the lowering of the UID/GID of any running process.

Heuristic #1 is the only of the two that can generate false positive alerts. Most of these may be eliminated through more security-conscious system configuration and administrative processes. The few instances that cannot be discarded are hence easily whitelisted, but should be clearly identified as known potential security holes.

]]>
http://10.10.10.29/detection-of-privilege-elevation-by-malware-on-linux/feed/ 0
How To Test Malware Detection Capabilities http://10.10.10.29/how-to-test-malware-detection-capabilities/ http://10.10.10.29/how-to-test-malware-detection-capabilities/#respond Wed, 22 Mar 2017 01:29:56 +0000 http://10.10.10.29/?p=2520 Continue reading How To Test Malware Detection Capabilities ]]> From a software quality perspective comes the idea to verify our system detection capabilities. More specifically, we aim at verifying that the overall system is able to detect malware and hacker activities from the SNOW sensor installed on a host to the SNOWboard Command System used by the hunters. It’s a simple idea, but not so trivial.

First, a “detection capability” is not a feature of a single component, but is an emergent property coming from the interaction of many components. This constrains us to create an end-to-end infrastructure dedicated to testing. Second, there are some “unpredictable“ behaviours in some components induced by deliberate randomness and communication latencies between the components. This limits how far the testing environment can be fine-tuned.

Even without these concerns, how can we test that the system is truly able to detect malware and cyber attacks?

In fact, it’s not the malware itself, nor the hacker’s presence that are detected, but their behaviour. In simple cases, a behaviour is clearly suspicious. There are well-known malicious techniques, such as process hollowing, that are easily recognizable. In more complex cases, many clues must be combined together to expose a malicious intent. In any case, to test our detection capability, we must recreate scenarios we expect to stimulate our SNOW sensor and centralized detection algorithms.

To achieve this, we can use real malware samples, provided that we remain very cautious. The malware must be isolated so as not to invade our testing infrastructure. Also, we must clean up the damage these samples wreak using virtual machine snapshot capabilities, for example. Alternatively, we can use custom made harmless malware. Since it is the behaviour that is detected, it is not mandatory to cause real damage. Such pseudo-malware can be self-cleaning, avoiding the need to use snapshots. In practice, both real and harmless malware are valuable.

In the analytics central, we want to detect if an alert of the expected type has been raised within a reasonable time range. An easy way to proceed could be to search in the database. However, it does not guarantee end-to-end detection. Instead, we communicate with the SNOWboard RESTful API, retrieving the same data as used by the SNOWboard Command System.

The whole process is automated. Having malware and pseudo-malware in a pool, our automation platform can launch them and check if the alerts have been raised without any human intervention. A notification is sent when a problem is detected.

In summary, in order to test our detection capabilities, we prepare scenarios that act suspiciously and execute them on a dedicated, isolated testing infrastructure. After the maximum reasonable detection delay, we parse the same data that hunters request to check if the expected alerts have been raised. That way, we manage the quality of our solution and we ensure hunters have the information they need to provide a high-quality service.

]]>
http://10.10.10.29/how-to-test-malware-detection-capabilities/feed/ 0
Defending From Endpoint Agent Disablement Cyber Attacks http://10.10.10.29/defending-from-endpoint-agent-disablement-cyber-attacks/ http://10.10.10.29/defending-from-endpoint-agent-disablement-cyber-attacks/#respond Fri, 10 Mar 2017 01:34:17 +0000 http://10.10.10.29/?p=2522 Continue reading Defending From Endpoint Agent Disablement Cyber Attacks ]]> When actively monitoring endpoints to detect signs of cyber attacks, preserving visibility through the endpoint sensor is crucial. A likely attack scheme for malware stops the sensor process, does its malware deeds, then restarts the sensor process, or even leaves it dead. However, losing connectivity with a sensor is a likely event due to various systems actions and outside circumstances. This article discusses ways to distinguish various scenarios in case of endpoint sensor connectivity loss and figure out when to sound the red alert.

Houston, we’ve lost contact

The first step when trying to figure out why contact was lost is to examine all possibilities. Here are the likely scenarios:

    1. The agent process is attacked:
      1. it is killed.
      2. it is suspended.
    2. The agent process crashed.
    3. The machine shut down
      1. The machine was properly turned off (using OS-specific tools).
      2. The machine has had software, hardware or power failure that brought it down.
      3. The machine was virtual, and was interrupted.
    4. Network failure…
      1. … affecting only this one host or subnet.
      2. … affecting the whole organization being monitored, or the Internet route between the organization and the remote analytics database where events are monitored from.

A few simple measures can help distinguish between most of these scenarios. Let’s start from the bottom of the list and go up.

Communication failures

Scenario 4b is easy to distinguish from all others — connectivity is thence lost for all other machines within the same organization. A phone call to the IT staff is in order, but sometimes the Internet is just that tough to deal with. So, it comes down to diagnosing a single machine going incommunicado.

The way Arcadia’s endpoint agents will handle this case is by having them in contact with one another across an ad hoc peer-to-peer network deployed on the customer’s network. In other words, each SNOW-defended endpoint within a LAN maintains a TCP connection with a subgroup of others (the size of this subgroup depending on the machine’s purpose and resource constraints). Should any of these TCP connections fail, its peers know it instantly and can report that fact back to the central analytics cloud. This set of peer-to-peer communication features are under development as this is written.

In addition, the agent is aware of its host’s connectivity problems. When the machine regains connectivity, it reports on how long it was down, so that we can corroborate this information with what was reported by its peers. This way, if an agent disablement is being disguised by the attacker as a network failure, there will be a gap in the offline time interval during which the agent was not alive. This is enough information to warrant deeper a investigation, hence to raise an alert.

Machine failures

Obviously, machine failures get detected as communications failures. However, many such failure modes can be documented using supplemental clues. With scenario 3a, the agent gets a message that the machine is being shut down, and is typically given time to stop of its own volition. It can take advantage of that moment to report a final telemetry beacon indicating the situation.

In scenario 3b, the agent is forcibly restarted when the machine goes down. Additionally, the uptime counter for the machine gets reset. Therefore, the restart of the agent reports the uptime counter: if it’s low enough, the diagnostic is complete. Alternatively, virtual machines (scenario 3c) being interrupted look like network failures, until they are resumed. When this happens, there is a gap in the telemetry stream that corresponds to the duration of the interruption. So when the agent beacons again, without any indication to being restarted and with no telemetry buffered within the realtime interval, we understand the machine was virtual and was interrupted. Further indications that the machine is virtual (such as the presence of VMware Tools or access to AWS local queries) confirm that the machine is virtual, further supporting the diagnostic.

Agent crashes and attack scenarios

The SNOW endpoint agent is programmed by mere humans, so yes, certain situations get it to crash. There! I said it. Two measures facilitate the detection and remediation of crashes.

First, agent processes are registered with the operating system so as to be restarted as soon as they go down. This way, following a crash, the restarted agent reports immediately to the central analytics cloud, carrying the forensics information it accumulated before the crash event.

Second, the endpoint agent is actually composed of two processes (both registered to the OS, as described above) tracking each other’s life cycle. If any of these two processes goes down, the other reports the fact over to the cloud, and repeatedly attempts to bring it back up. Therefore, effectively disabling the agent requires to bring down these two processes. For any one of these processes to crash is odd, but possible; for both of them to crash simultaneously is highly unlikely, enough to raise an alert regarding the possibility of deliberate termination.

Attack scenarios

The last point makes it clear that scenario 1a cannot go undetected: either the agent is restarted right away by the OS or its watchdog, so that either the malware actions get recorded despite the attacker’s intent, or the attacker generates so much noise that he raises the attention of hunters. Scenario 1b still is much more pernicious, as its fingerprint is very similar to that of scenario 3c. That said, some heuristics can work in our favor:

      • Non-virtual machines cannot be interrupted and resumed (a machine being put to sleep sends a message when it goes down and when it wakes up).
      • When a VM is interrupted, it is typically for a rather long period of time. If the attacker suspends the agent processes during mere seconds, this should raise an alert as an abnormal VM interruption pattern.
      • Peers within the local network, as discussed for mitigating communications failures, send ping-pongs at randomized regular intervals, and keep statistics of how quick the response comes back. A suspended agent process would not respond within the expected sub-second delay, unless the local network is extremely busy (an atypical condition).

A failure to check any of these heuristics should raise an alert to conduct a full investigation on the target host.

Authentication of agents

The last frontier in agent disablement attack is full agent communications spoofing: an attacker reverses the communications protocol between the agent, peers and central analytics cloud, then responds correctly to neighbor requests and plays dumb with the cloud. This is not an easy attack to pull off, as it takes lots of resources to perform this reverse engineering, and then communicate without tripping any behavior normality heuristic. However, it underscores a fundamental weakness of endpoint protection: up to now, it has been assumed that agents would communicate without any external injection of trust. Therefore, agents can authenticate the cloud, but agents cannot authenticate each other, nor can the cloud authenticate an agent. In other words, the cloud is never sure it is really speaking to an agent, or to a dog. We Arcadians are hard at work on this problem as this is written… stay tuned.

]]>
http://10.10.10.29/defending-from-endpoint-agent-disablement-cyber-attacks/feed/ 0