android 学习--WatchDog分析

2019-04-15 17:06发布

       看门狗:当初由单片机构成的微型计算机系统中,由于单片机的工作常常会受到来自外界电磁场的干扰,造成程序的跑飞,而陷入死循环,程序的正常运行被打断,由单片机控制的系统无法继续工作,会造成整个系统的陷入停滞状态,发生不可预料的后果,所以出于对单片机运行状态进行实时监测的考虑,便产生了一种专门用于监测单片机程序运行状态的芯片。我们知道当初WatchDog是用来监测硬件的,如果你也认为是监测硬件的那就错了.android WatchDog是用来监测软件的,主要是监测ActivityManagerService,WindowsManagerService,PowerManagerService是否进入死锁,如果发现进入死锁就会重启机器。程序就像人一样,总会出错,总会生病,面WatchDog 就像医生,定期给你检查,如果发现你生病了就Restart. SystemServer进程里面有那么多Service为什么只监测上面三个而已呢?从这里可以看出它们的重要性,因为其他Service出问题,并不一定会造成机器无法使用,只是部分功能没有办法使用,但如果是上面这三个Service出问题就会有很严重的问题。必须重启机器进行恢复。
      下面我们先看一下在哪里启动的?主要是SystemServer里面的main的 ServerThread对象的initAndLoop方法里面生成对象:Watchdog.getInstance();
我们来看一下:
public static Watchdog getInstance() { if (sWatchdog == null) { sWatchdog = new Watchdog(); } return sWatchdog; } 这里只是生成一个Watchdog对象。我们再看构造函数;只是简单的创建一个Handler对象,下面我们一起来看一上这个handler的实现: final class HeartbeatHandler extends Handler {         @Override         public void handleMessage(Message msg) {             switch (msg.what) {                 case MONITOR: {                     // See if we should force a reboot.                     int rebootInterval = mReqRebootInterval >= 0                             ? mReqRebootInterval : REBOOT_DEFAULT_INTERVAL;                     if (mRebootInterval != rebootInterval) {                         mRebootInterval = rebootInterval;                         // We have been running long enough that a reboot can                         // be considered...                         checkReboot(false);                     }                     final int size = mMonitors.size();                     for (int i = 0 ; i < size ; i++) {                         mCurrentMonitor = mMonitors.get(i);                         mCurrentMonitor.monitor();                     }                     synchronized (Watchdog.this) {                         mCompleted = true;                         mCurrentMonitor = null;                     }                 } break;             }         }     }这里主要对MONITOR消息处理,主要是Check没有监视对象。 其实,在SystemServer里面还初始化了Watchdog另外一个方法: Watchdog.getInstance().init(context, battery, power, alarm, ActivityManagerService.self());  我们再来看一下init方法:    mResolver = context.getContentResolver();         mBattery = battery;         mPower = power;         mAlarm = alarm;         mActivity = activity;         context.registerReceiver(new RebootReceiver(),                 new IntentFilter(REBOOT_ACTION));         mRebootIntent = PendingIntent.getBroadcast(context,                 0, new Intent(REBOOT_ACTION), 0);         context.registerReceiver(new RebootRequestReceiver(),                 new IntentFilter(Intent.ACTION_REBOOT),                 android.Manifest.permission.REBOOT, null);         mBootTime = System.currentTimeMillis(); 这里主要是注册两个重启广播:第一个是没有带参数,第二个是带参数,同时还带权限检测。 我们再来看一下WatchDog线程的启动,主要是在SystemServer里面启动: Watchdog.getInstance().start() 接着再来看一下WatchDog的run方法,这个方法有种长,不过功能很简单。
public void run() {
        boolean waitedHalf = false;
        while (true) {
            mCompleted = false;
            mHandler.sendEmptyMessage(MONITOR);

            synchronized (this) {
                long timeout = TIME_TO_WAIT;

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0 && !mForceKillSystem) {
                    try {
                        wait(timeout);  // notifyAll() is called when mForceKillSystem is set
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    timeout = TIME_TO_WAIT - (SystemClock.uptimeMillis() - start);
                }

                if (mCompleted && !mForceKillSystem) {
                    // The monitors have returned.
                    waitedHalf = false;
                    continue;
                }

                if (!waitedHalf) {
                    // We've waited half the deadlock-detection interval.  Pull a stack
                    // trace and wait another half.
                    ArrayList pids = new ArrayList();

                    /// M: WDT debug enhancement:
                    /// it's better to dump all running processes backtraces @{
                    // pids.add(Process.myPid());
                    mActivity.getRunningProcessPids(pids);
                    /// @}

                    pids.add(Process.myPid());
                    ActivityManagerService.dumpStackTraces(true, pids, null, null,
                            NATIVE_STACKS_OF_INTEREST);
                    waitedHalf = true;
                    continue;
                }
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.

            final String name = (mCurrentMonitor != null) ?
                    mCurrentMonitor.getClass().getName() : "null";
            EventLog.writeEvent(EventLogTags.WATCHDOG, name);

            ArrayList pids = new ArrayList();

            /// M: WDT debug enhancement
            /// it's better to dump all running processes backtraces
            /// and integrate with AEE @{
            /*
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
            */
            mActivity.getRunningProcessPids(pids);
            final File stack = dumpAllBackTraces(pids);
            /// @}

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Pull our own kernel thread stacks as well if we're configured for that
            if (RECORD_KERNEL_THREADS) {
                dumpKernelStackTraces();
            }

            /// M: WDT debug enhancement
            /// need to wait the AEE dumps all info, then kill system server @{
            /*
            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                name, null, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}
            */
            Slog.v(TAG, "** save all info before killnig system server **");
            mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, name, null, null, null);
            SystemClock.sleep(25000);
            /// @}

            // Only kill the process if the debugger is not attached.
            if (!Debug.isDebuggerConnected()) {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + name);
                Process.killProcess(Process.myPid());
                System.exit(10);
            } else {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            }

            waitedHalf = false;
        }
    }

  while (true) 无限循环里面一直在监测, 然后发送一个Message开始检测 mHandler.sendEmptyMessage(MONITOR);  然后第二个while里面:进入到wait状态(等待30s)直到超时再去检测是否是"锁住状态",如果是没有锁住,被监测的线程没有被锁住,则continue 退出这次检测。   如果是被锁住了,获取当前的线程,调用 AMS 的dumpStackTraces方法打印信息。继承等待30s,如果还是被锁住,则打印当前系统的信息:    mActivity.getRunningProcessPids(pids); final File stack = dumpAllBackTraces(pids);再重启SystemServer进程:  Process.killProcess(Process.myPid());
                System.exit(10);