-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why would Copy target not copy a file running as MSA Group Principal? #10973
Comments
To debug filesystem virtualization, I'd try Process Monitor. If that doesn't show anything special going on, then try to change the altitude of its driver (as shown by the |
Thanks, I asked ChatGPT to explain a bit your answer, as this is not an area I have any experience with, and got the gist of how minifilters work. But, I am a little lost on what you mean by the other side of luafv? Why would the relative ordering matter? The build server is configured as:
If I understand you correctly, only MsSecFlt and WdFilter are running on this box before I checked to see if C:\users\msaTeamCity$\AppData\Local\VirtualStore exists as administrator, and it does not exist. This would make me think nothing is getting redirected? Is that an incorrect conclusion to draw? The recommendations from ChatGPT were to focus on:
|
I meant, I'm not sure how luafv virtualises files. If it returns STATUS_REPARSE to ask the caller to redo the open with a different file name, and the minifilter of Process Monitor is above luafv, I think Process Monitor will log this reparse. However, if luafv instead passes the open request down with a new file name but pretends to upper drivers that the file was opened with the original name, then Process Monitor above luafv won't see which file was actually opened. So if the procmon log does not show STATUS_REPARSE for the file and procmon is above luafv, then you could try to move procmon below luafv, in order to let procmon log which name luafv passes down to the file system. |
Got it. |
Anti-malware drivers like WdFilter / MsSecFlt seem unlikely to be the cause here — if one of them didn't like the copy operation, I think it'd just return an error and MSBuild would report that. |
OK. This next question is stupid/crazy but, hear me out. Is it possible there is a bug in MSBuild Copy target? It looks like it was optimized 6 years ago to support multi-threaded copy, so the rational person would expect someone to report an issue with this before now if there was a real problem cf https://github.com/dotnet/msbuild/blame/4a6306491b49be676ded2a43c1e4557785772517/src/Tasks/Copy.cs#L513 I tried to isolate the problem further by writing a TeamCity Powershell command step that runs as a ps1 file. This also runs as the same MSA Group Principal msaTeamCity$ and successfully copies the files. D:\BuildAgent\temp\buildTmp\powershell10788514782847466008.ps1$dest = "d:\logs\TeamCity"
$source = Join-Path $env:USERPROFILE -ChildPath ".nuget" | Join-Path -ChildPath "packages" | Join-Path -ChildPath "microsoft.data.sqlclient.sni" | Join-Path -ChildPath "5.2.0"
Write-Host $source
if (Test-Path $source) {
# Copy all files from the source directory to the destination directory
$copyErrors = $null
Copy-Item -Path "$source\*" -Destination $dest -Recurse -Force -Verbose -ErrorVariable +copyErrors
Write-Host "Files have been copied from '$source' to '$dest'."
Write-Host "Errors: $copyErrors"
} Outputs:
Visual Proof it successfully copied the whole folder: |
We plan to further rule out MSA Group Principals next week by changing the authentication mechanism back to NT AUTHORTY\System or a password-based account, just to briefly test. I don't know if MSBuild has any official recommendations on user access control privileges? |
Team triage: Have you been able to verify that this behaviour is the same in other authentication mechanisms? In a failing case, what exactly have you seen in Process Monitor trace? Was there an IO failure somewhere? |
Your screen shot shows the "Copying file from" message msbuild/src/Tasks/Resources/Strings.resx Lines 245 to 248 in 8f6b8ad
which I think is logged from Lines 340 to 348 in 4a63064
where it is immediately followed by a File.Copy call. That looks pretty foolproof.
That's Windows PowerShell on .NET Framework. Do you get the same result with PowerShell Core? (So that it uses .NET Runtime like Which version of Windows are you using, and are the files on NTFS or ReFS? If ReFS, then this might be a bug in Copy-on-Write. |
I'm going to follow up tomorrow. Internal deadline today. Excited to find the root cause. |
The problem appears to be a bug in MSBuild. I do not know why but ProcessMonitor shows no file attempt occurring. It also shows it under cmd.exe which is confusing as it is really dotnet msbuild that is running, but that is likely due to TeamCity launching dotnet through a cmd wrapper batch file. Separately, I moved my PowerShell script to inside a build.targets file and it also worked when run directly inside msbuild, so it does not seem to be permissions related from the MSA Group Principal, although I dont understand why it works when I run it over PS Remoting to the build server. One thought I had is that PS Remoting is messing with the parallelism options in the Copy task, although I dont understand how that could be. |
I forgot to add, we are using NTFS, not ReFS. Definitely seems like an MSBuild bug unless I am misreading the logs somehow? Happy to repeat the test with further instruction on fine tuning using MSBuild and ProcessMonitor, but MSBuild verbosity is already set to diagnostic, and I captured the output in binlog. |
That's certainly strange. What kind of call stack does Process Monitor show for this access? |
I don't know what you are asking. The screenshot clearly shows the native dlls are not even being copied. It looks as if somehow MSBuild is just dropping the copy operation altogether. What stack trace would exist for something that does not occur? [Edit: If you look at the screenshot closely, where the yellow arrow is, I am highlighting where you would expect to see log messages in Process Monitor but do not see any for the affected dlls, Sorry if that was not clear] One thing to note, in searching the dotnet/SqlClient issues on GitHub, at least one other user reported what I would call "MSBuild phantom writes" where the target did not actually copy anything. I need to find the exact issue as there are at least a dozen involving SNI.dll issues. It does make me think that some underlying problem has been going on for years. |
I mean, it's difficult for me to believe Process Monitor would show I don't think it's the same problem. There, it says the item types had no items. In your case, the |
You are right. I took the wrong screnshot. tldr; It appears dotnet.exe is deleting the file after it copies it. The culprit appears to be IncrementalClean task that does not add these files to the _CleanRemainingFileWritesAfterIncrementalClean item collection. I do not know much about how this works, but it looks like since these are not included in TaskRunner.csproj.FileListAbsolute, they get deleted. How do I make this stop? Is it simply that the official guidance on how to package .NET Framework apps being followed by the Microsoft.Data.SqlClient team is incorrect for Microsoft.Data.SqlClient.SNI? I did point out to them that grpc.core package I use does not have this problem and is used in the same project, and it used a different design pattern, but they told me they're using the one guided by MSBuild. long details below I re-ran the process, pulled the timestamp from MSBuild: ProcessMonitor, filtered only on dotnet.exe, starting at 5:51:01.908233, just before MSBuild purports to have invoked CopySNIFiles target. Here is a screenshot showing the openresult was Created. It appears it was then deleted: In trying to pin down where this could occur, it appears there is an IncrementalClean task that runs after everything else. I honestly don't know what this task is and never knew it existed. I am also running msbuild with explicit --no-incremental, so I don't understand why this IncrementalClean thing is deleting my files. I would expect --no-incremental would tell it "assume this is a clean checkout." |
I actually think it is the same problem, if you squint. I think there is some flaw in how IncrementalClean works on clean checkouts that causes issues for the guidance given to the Microsoft.Data.SqlClient team on how to package this. I think if they follow the guidance for ".NET Core", this problem likely goes away. The documentation is vague on why this approach exists uniquely for .NET Framework, so I dont know why there is separate guidance for NETFX. The reason I was able to build this logged in as local administrator is I no longer had a clean checkout. TeamCity was checking out the code for me. Thats the key difference I missed and assumed was due to MSA Group Principal. |
IIRC, the incremental cleaning works via the FileWrites item type. Targets should add output files to that. If the previous build added a file to FileWrites, but the current build does not, then the file is assumed to be no longer needed and is deleted. |
I'm trying to figure out the root cause to why, when running a
dotnet build
in TeamCity, under a MSA Group Principal, the Copy task claims it was successful according to the MSBuild binlog output, but the file does not actually get copied.This works correctly when I log in as myself and run the same command on the build server. I am a local administrator on the build machine in question.
I use
-bl:d:\logs\msbuild.binlog
both as MSA Group Principal and as myself.One wildcard in the mix is that we are setting
AppendRuntimeIdentifierToOutputPath
tofalse
andAppendTargetFrameworkToOutputPath
tofalse
as well. - I was surprised to see these are defined in Microsoft.NET.DefaultOutputPaths.targets and not part of Microsoft.Common.CurrentVersion.targets where$(OutDir)
is defined. As the Copy task is being imported from https://nuget.info/packages/Microsoft.Data.SqlClient.SNI/5.2.0 's buildTransitive\net462\Microsoft.Data.SqlClient.SNI.targets, I think the problem may somehow be related to phases in MSBuild's evaluation order, but I have not been able to figure it out.A long time ago, Nick Geurrera (the architect of the SDK targets) mentioned to me that sometimes things can fail due to process virtualization, but he did not get into specifics of how to debug it, nor did he directly explain how to resolve the issue inductively - I just took his hint and realized that I should workaround the problem and try something different. At the time, he said the primary way this could occur is if running as NT AUTHORITY\System - so we are instead running as msaTeamCity$, a MSA Group Principal (passwordless login that authenticates through active directory membership).
What's weird is, even after I was able to somehow get it to use the right OutDir value, Copy still... didn't copy anything.
The text was updated successfully, but these errors were encountered: