android -- WatchDog看门狗分析

2019-04-15 16:17发布

android -- WatchDog看门狗分析



那么被监控的有哪些Service呢? :frameworksaseservicesjavacomandroidserveram    :frameworksaseservicesjavacomandroidserver   :frameworksaseservicesjavacomandroidserver


run @
      Slog.i(TAG, "Init Watchdog");
      Watchdog.getInstance().init(context, battery, power, alarm,

    public static Watchdog getInstance() {
        if (sWatchdog == null) {
            sWatchdog = new Watchdog();
        return sWatchdog;

    public void init(Context context, BatteryService battery,
            PowerManagerService power, AlarmManagerService alarm,
            ActivityManagerService activity) {
        // 上下文环境变量
        mResolver = context.getContentResolver();
        mBattery = battery;
        mPower = power;
        mAlarm = alarm;
        mActivity = activity;

// 登记 RebootReceiver() 接收,用于reboot广播接收使用
        context.registerReceiver(new RebootReceiver(),
                new IntentFilter(REBOOT_ACTION));
// 系统启动时间
        mBootTime = System.currentTimeMillis();


run @
调用 Watchdog.getInstance().start(); 启动看门狗

首先看下 Watchdog 类定义:
/** This class calls its monitor every minute. Killing this process if they don't return **/
public class Watchdog extends Thread {

从线程类中继承,即会在一个单独线程中运行,调用thrrad.start()即调用 中的 run() 函数

    public void run() {
        boolean waitedHalf = false;

        while (true) {
            mCompleted = false;
            // 1、给mHandler发送 MONITOR 消息,用于请求检查 Service是否工作正常

            synchronized (this) {
// 2、进行 wait 等待 timeout 时间确认是否退出循环            
                long timeout = TIME_TO_WAIT;                
                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0 && !mForceKillSystem) {
                    try {
                        wait(timeout);  // notifyAll() is called when mForceKillSystem is set
                    } catch (InterruptedException e) {
              , e);
                    timeout = TIME_TO_WAIT - (SystemClock.uptimeMillis() - start);

// 3、如果 mCompleted 为真表示service一切正常,后面会再讲到
                if (mCompleted && !mForceKillSystem) {
                    // The monitors have returned.
                    waitedHalf = false;

// 4、表明检测到了有 deadlock-detection 条件发生,利用 dumpStackTraces 打印堆栈依信息
                if (!waitedHalf) {
                    // We've waited half the deadlock-detection interval.  Pull a stack
                    // trace and wait another half.
                    ArrayList pids = new ArrayList();
                    ActivityManagerService.dumpStackTraces(true, pids, null, null);
                    waitedHalf = true;
                    continue; // 不过这里会再次检测一次

            // 5、打印内核栈调用关系
            // Pull our own kernel thread stacks as well if we're configured for that
            if (RECORD_KERNEL_THREADS) {

// 6、ok,系统出问题了,检测到某个 Service 出现死锁情况,杀死SystemServer进程
            // Only kill the process if the debugger is not attached.
            if (!Debug.isDebuggerConnected()) {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + name);
            } else {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");

            waitedHalf = false;

主要工作逻辑:监控线程每隔一段时间发送一条 MONITOR 线另外一个线程,另个一个线程会检查各个 Service 是否正常运行,看门狗就不停的检查并等待结果,失败则杀死SystemServer.

3、Service 检查线程

     * Used for scheduling monitor callbacks and checking memory usage.
    final class HeartbeatHandler extends Handler {
    public void handleMessage(Message msg) {  // Looper 消息处理函数
            switch (msg.what) {
                case MONITOR: {
                // 依次检测各个服务,即调用 monitor() 函数                     final int size = mMonitors.size();                     for (int i = 0 ; i < size ; i++) {
                        mCurrentMonitor = mMonitors.get(i);

// 检测成功则设置 mCompleted 变量为 true
                    synchronized (Watchdog.this) {
                        mCompleted = true;
                        mCurrentMonitor = null;

下面我们来看一下各个Service如何确定自已运行ok呢?以 ActivityManagerService 为例:

private ActivityManagerService() {
        // Add ourself to the Watchdog monitors.

然后实现 monitor() 函数:
    /** In this method we try to acquire our lock to make sure that we have not deadlocked */
    public void monitor() {
        synchronized (this) { }

明白了吧,其实就是检查这个 Service 是否发生了死锁,对于此情况就只能kill SystemServer系统了。对于死锁的产生原因非常多,但有个情况需要注意:java层死锁可能发生在调用native函数,而native函数可能与硬件交互导致时间过长而没有返回,从而导致长时间占用导致问题。

final class GlobalPssCollected implements Runnable {
        public void run() {
    final class HeartbeatHandler extends Handler {
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case GLOBAL_PSS: {
                    if (mHaveGlobalPss) {
                        // During the last pass we collected pss information, so
                        // now it is time to report it.
                        mHaveGlobalPss = false;
                        if (localLOGV) Slog.v(TAG, "Received global pss, logging.");
                } break;
        void logGlobalMemory() {        
        Process.readProcLines("/proc/meminfo", mMemInfoFields, mMemInfoSizes);
        Process.readProcLines("/proc/vmstat", mVMStatFields, mVMStatSizes);        