This is part 2 of a two part blog.
Part1: Time for a Virtual Domain Controller and vMotion gotchas educational trip
Part2: DC gone wild
On Part 1 I explained how I screwed up my domain controller time for the entire domain. On this section I will go through on how to fix it but first let me introduce the cast so we won’t get confuse.
DC1 – Bad virtual domain controller. Holds the PDC emulator FSMO role.
DC2 – Good domain controller.
Once we realize that we have a time issue we manually change the time on the two domain controller. After the manual time change we started getting phone calls of users password expired, this is validated by DC logs that shows machine/user account issue. Some users are working and some are not, the users that are working seems to all authenticate to DC2 and this is validated by going to the command prompt type “set” and look for “LOGONSERVER=\\DC2”. At this point I know that DC2 is working and DC1 is not. I also know that the two are not talking by adding a bogus file on DC1 NETLOGON directory share and looking at DC2 NETLOGON share expecting it to get sync there. The logs validates the kerberos DC1 machine account issue.
At this point we can fix this in two ways (I’m sure there are other ways)
Option 1 which is the easiest option is just to decomission DC1 and seize all roles and move it to DC2 and problem solves. If circumstance permits, this is the easiest fix, but there are number of gotchas that I am honestly not prepared to tackle so I put this option as last resort and relocate this option at the back of my head.
Option 2 is just to fix it. If you already spent a lot of time on that DC like patching it and getting it all locked downed or you use PKI and that DC has a working certificate for PKI authentication, then fixing it is really the only viable option.
To fix it, I use a combination of burflags and netdom. If this was just a regular machine I would just rejoin it to the domain and problem solve. This domain controller needed to reset its machine password and convey the new password to DC2 and at the same time I want to make sure that DC1 grab and sync new changes from DC2 and not the other way around.
Burflags – to address the sync issue I use the burflag technique. I have use this on a handful of occasion where my SYSVOL directory or NETLOGON directory gets out of sync. There’s surplus of information on the web about this so there is no need to explain it. What I did is open up regedit on bad DC1 and navigate to “HKEY_LOCAL_MACHINE \SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\ Backup/Restore\Process at Startup” change “burflags” to “D2”. “D2” tells this DC1 to be nonauthoritative and sync Active Directory from his DC2 brother.
Burflags will take care of the Active Directory inconsistency but we still have kerberos machine account issue to take care. To do this we need to employ the help of NETDOM.
Netdom – I am using 2008 R2 so Netdom is included, you might need to download support tools if you are using 2003 server. We will reset DC1 machine account password and let DC2 know about the change. Log-in to DC1 and stop “Kerberos Key Distribution Center” service and set it to manual. Next open up a command prompt (make sure you run as admin) and issue command “netdom.exe resetpwd /s:DC2 /ud:MYDOMAIN\Administrator /pd:MyAdminPassword”. Make sure it says “successful” otherwise check syntax. Also take note that I am issuing this NETDOM utility from the bad DC1 and that you “resetpwd /s:DC2”, DC2 being the good server. Once successful, normalize and set Kerberos Key Distribution Center service to Automatic and reboot DC1.
At this point the machine should be back to a normal working state. It is normal behavior for SYSVOL and NETLOGON share to not exist for the first couple of minutes depending on how big they are. Remember the burflags “D2” entry causes DC1 to delete all these share content and sync up with DC2. In my case it took between 5 to 10 minutes.
That is all folks, I hope you enjoy that 2 part series.