(*If you don't feel like reading everything, skip to the bottom two paragraphs for my questions)
I've had a premier call open with MS since August. This week I had a Microsoft Technician in-house. Though we eliminated some possibilities, we're not really closer to a cause or solution.
Every time we work with an expert, I get a different explanation to describe the situation we are viewing.
Quick summery of the issue: We've been using Group Policy to manage most Windows XP and 7 settings for years, but starting the middle of last year, we began having clients with machines where some or all group policies would fail to apply.
These could be long assigned policies, new polices, or changes to policies. It would never affect everyone or even a majority at once, and the resolution is never the same. Sometimes a GPUDPATE /FORCE sometimes fixed automajically the next day,
sometimes (but very rarely) longer.
Troubleshooting History:
What we found in early troubleshooting, that these machines, had errors in Event Viewer for Netlogon, Time-Sync, and Group Policy. The other issue we noticed, was that our GPRESULT /H reports were missing security groups and the denied section was
nothing but SSID's. The first issue pointed me to:
Event ID 5719 and event ID 1129 may be logged when a non-Microsoft DHCP Relay Agent is used
I installed these Hot Fixes. No change to any of the errors in event viewer, or to our Group Policy problems.
Initial work with Premier Support found that Netlogon, Time-Sync, and Group Policy, were failing before loading of the network stack. The suggestion was to apply the group policy setting "Always wait for the network at computer startup and
logon". At the time, this seemed not to work. The policy was set on a test bed of laptops and desktops, and no changes in behavior were seen after 3 days.
Windows 7 Clients intermittently fail to apply group policy at startup
For some time after this, we were collecting GPSVC and NetTrace logs for Premeir Support, trying to document and troubleshoot the problem. Eventually we got fed up and asked our TAM to call in a pro to get this resolved. We were sent an engineer
for 3 days. For three days we banged away on this issue. We verified AD and replication health, we tried numerous fixes and workarounds. I learned 3 different desriptions of how Group Policy works, and in the end we thought we had a workaround
using the "Always wait for the network at computer startup and logon" because of a single success late in the day. On day 3 we tried replicating this fix, and quickly realized that the same issue we were having preventing other GPOs to apply,
were also preventing our "fix" GPO from applying. So we went the route of using a registry entry. I also had a problem that even though it was making the process more consistant, it was still taking 3 reboots for a Computer Policy, assigned
to a computer object via Security Group, to fully take affect on a computer.
I used the registry methods in the above article. It didn't work, no sign it was having the same affect the GPO had had.
Our support engineer claimed this was the proper method, but that path wasn't even close in a Windows 7 SP1 registry, and after creating all the keys that were not present, it still didn't work.
Always wait for the network at computer startup and logon - AzureWeb
We ran out of time, our engineer returned home.
I can understand how these errors indicate a problem applying Group Policy at boot. But to me it doesn't explain why it doesn't correct post boot, and after a GPUDPATE /FORCE and a reboot.
It also doesn't explain why we were working fine for years, then all of a sudden DHCP is being outrun by background services. (By the way logging showed DHCP wasn't significantly delayed, out boot process was actually excellent, health wise.)
Why all of a sudden is this not behaving optimly? No changes to network design or function. No changes to the domain since 2008 R2 was installed in 2011.
Today I'm reading through all these KB's and articles again, and took some time to read:
[Forum FAQ] Common steps to start troubleshooting Group Policy
application and it's links below.
We ran though all of that before and during the 3-day onsite. It's not getting us any closer to the cause or a solution.
I found and begin some deep reading in this link today. It has some additional information I will try to use next week:
Group Policy Basics - Part 3: How Clients Process GPOs
The one unanswered question I have is this. How is group policy supposed to apply to a computer, when that policy is applied to a AD Security Group, in which the computer object is a member?
Before we began having this problem, we would assign a computer GPO, then ask the user to reboot. If it were a user GPO, we'd ask the user to log off, or reboot. Either way, if we allowed a few minutes for AD and FRS replication, the user would
log back in with that new policy in affect. A new imaged machine would boot with all the GPO's linked to that domain and assigned to "Authenticated Users", already in affect. Admin groups would be present in administrators, proxy settings
would be set in Internet Explorer, etc.
Now I'm aked to beleive this was never the case from Premeier Support and Microsoft Engineers. That those policies require the equilent of a "GPUPDATE /FORCE" that was executed by the Local_System account. That 3 reboots may
be nessessary for a group policy to be applied. One for the AD Security Group to be applied. One for the Computer Policy to be applied. And a final one for the policy in the GPO to be applied to Windows.
Can someone confirm or correct this information please? It's imperitive to my troubleshootng.
There's no place like 127.0.0.1