Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why would Copy target not copy a file running as MSA Group Principal? #10973

Open
jzabroski opened this issue Nov 13, 2024 · 19 comments
Open

Why would Copy target not copy a file running as MSA Group Principal? #10973

jzabroski opened this issue Nov 13, 2024 · 19 comments
Labels
author-responded Author responded, needs response from dev team.

Comments

@jzabroski
Copy link

jzabroski commented Nov 13, 2024

I'm trying to figure out the root cause to why, when running a dotnet build in TeamCity, under a MSA Group Principal, the Copy task claims it was successful according to the MSBuild binlog output, but the file does not actually get copied.

This works correctly when I log in as myself and run the same command on the build server. I am a local administrator on the build machine in question.

I use -bl:d:\logs\msbuild.binlog both as MSA Group Principal and as myself.

One wildcard in the mix is that we are setting AppendRuntimeIdentifierToOutputPath to false and AppendTargetFrameworkToOutputPath to false as well. - I was surprised to see these are defined in Microsoft.NET.DefaultOutputPaths.targets and not part of Microsoft.Common.CurrentVersion.targets where $(OutDir) is defined. As the Copy task is being imported from https://nuget.info/packages/Microsoft.Data.SqlClient.SNI/5.2.0 's buildTransitive\net462\Microsoft.Data.SqlClient.SNI.targets, I think the problem may somehow be related to phases in MSBuild's evaluation order, but I have not been able to figure it out.

A long time ago, Nick Geurrera (the architect of the SDK targets) mentioned to me that sometimes things can fail due to process virtualization, but he did not get into specifics of how to debug it, nor did he directly explain how to resolve the issue inductively - I just took his hint and realized that I should workaround the problem and try something different. At the time, he said the primary way this could occur is if running as NT AUTHORITY\System - so we are instead running as msaTeamCity$, a MSA Group Principal (passwordless login that authenticates through active directory membership).

What's weird is, even after I was able to somehow get it to use the right OutDir value, Copy still... didn't copy anything.

Image

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Nov 13, 2024

To debug filesystem virtualization, I'd try Process Monitor. If that doesn't show anything special going on, then try to change the altitude of its driver (as shown by the fltmc command) to the other side of luafv.

@jzabroski
Copy link
Author

Thanks, I asked ChatGPT to explain a bit your answer, as this is not an area I have any experience with, and got the gist of how minifilters work. But, I am a little lost on what you mean by the other side of luafv? Why would the relative ordering matter?

The build server is configured as:

fltmc filters

Filter Name                     Num Instances    Altitude    Frame
------------------------------  -------------  ------------  -----
MsSecFlt                                5       385600         0
WdFilter                                3       328010         0
storqosflt                              0       244000         0
wcifs                                   0       189900         0
CldFlt                                  0       180451         0
FileCrypt                               0       141100         0
luafv                                   1       135000         0
npsvctrig                               1        46000         0
Wof                                     1        40700         0

If I understand you correctly, only MsSecFlt and WdFilter are running on this box before luafv, unless fltmc filters is something that needs to be run per-user.

I checked to see if C:\users\msaTeamCity$\AppData\Local\VirtualStore exists as administrator, and it does not exist. This would make me think nothing is getting redirected? Is that an incorrect conclusion to draw?

The recommendations from ChatGPT were to focus on:

WdFilter (Windows Defender Filter) - This minifilter is associated with Windows Defender. It scans files for malware during operations like opening, closing, and writing, which can potentially slow down file operations or even block them if a threat is detected. If the antivirus configurations are too aggressive, or if there are false positives, it might interfere with file copying.
MsSecFlt (Microsoft Security Filter) - Though less commonly discussed in documentation, given its altitude and naming, this filter might also be related to security operations, potentially scanning or modifying file operations, which could impact performance or block certain actions if it detects what it considers security risks.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Nov 13, 2024

I meant, I'm not sure how luafv virtualises files. If it returns STATUS_REPARSE to ask the caller to redo the open with a different file name, and the minifilter of Process Monitor is above luafv, I think Process Monitor will log this reparse. However, if luafv instead passes the open request down with a new file name but pretends to upper drivers that the file was opened with the original name, then Process Monitor above luafv won't see which file was actually opened. So if the procmon log does not show STATUS_REPARSE for the file and procmon is above luafv, then you could try to move procmon below luafv, in order to let procmon log which name luafv passes down to the file system.

@jzabroski
Copy link
Author

Got it.

@KalleOlaviNiemitalo
Copy link

Anti-malware drivers like WdFilter / MsSecFlt seem unlikely to be the cause here — if one of them didn't like the copy operation, I think it'd just return an error and MSBuild would report that.

@jzabroski
Copy link
Author

jzabroski commented Nov 14, 2024

OK. This next question is stupid/crazy but, hear me out. Is it possible there is a bug in MSBuild Copy target? It looks like it was optimized 6 years ago to support multi-threaded copy, so the rational person would expect someone to report an issue with this before now if there was a real problem cf https://github.com/dotnet/msbuild/blame/4a6306491b49be676ded2a43c1e4557785772517/src/Tasks/Copy.cs#L513

I tried to isolate the problem further by writing a TeamCity Powershell command step that runs as a ps1 file. This also runs as the same MSA Group Principal msaTeamCity$ and successfully copies the files.

D:\BuildAgent\temp\buildTmp\powershell10788514782847466008.ps1

$dest = "d:\logs\TeamCity"
$source = Join-Path $env:USERPROFILE -ChildPath ".nuget" | Join-Path -ChildPath "packages" | Join-Path -ChildPath "microsoft.data.sqlclient.sni" | Join-Path -ChildPath "5.2.0"
Write-Host $source
if (Test-Path $source) {
  # Copy all files from the source directory to the destination directory  
  $copyErrors = $null
  Copy-Item -Path "$source\*" -Destination $dest -Recurse -Force -Verbose -ErrorVariable +copyErrors
  Write-Host "Files have been copied from '$source' to '$dest'."
  Write-Host "Errors: $copyErrors"
}

Outputs:

15:14:41  PowerShell Executable: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
15:14:41   Working directory: D:\BuildAgent\work\77094892e16807fb
15:14:41   Command: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
15:14:41   PowerShell arguments: -NoProfile, -NonInteractive, -ExecutionPolicy, ByPass, -File, D:\BuildAgent\temp\buildTmp\powershell10788514782847466008.ps1
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0
15:14:41   VERBOSE: Performing the operation "Copy Directory" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build Destination: D:\logs\TeamCity\build".
15:14:41   VERBOSE: Performing the operation "Create Directory" on target "Destination: D:\logs\TeamCity\build".
15:14:41   VERBOSE: Performing the operation "Copy Directory" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462 Destination:
15:14:41   D:\logs\TeamCity\build\net462".
15:14:41   VERBOSE: Performing the operation "Create Directory" on target "Destination: D:\logs\TeamCity\build\net462".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.arm6
15:14:41   4.dll Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.arm64.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.arm6
15:14:41   4.pdb Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.arm64.pdb".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.targ
15:14:41   ets Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.targets".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.x64.
15:14:41   dll Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.x64.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.x64.
15:14:41   pdb Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.x64.pdb".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.x86.
15:14:41   dll Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.x86.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\build\net462\Microsoft.Data.SqlClient.SNI.x86.
15:14:41   pdb Destination: D:\logs\TeamCity\build\net462\Microsoft.Data.SqlClient.SNI.x86.pdb".
15:14:41   VERBOSE: Performing the operation "Copy Directory" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive Destination:
15:14:41   D:\logs\TeamCity\buildTransitive".
15:14:41   VERBOSE: Performing the operation "Create Directory" on target "Destination: D:\logs\TeamCity\buildTransitive".
15:14:41   VERBOSE: Performing the operation "Copy Directory" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462 Destination:
15:14:41   D:\logs\TeamCity\buildTransitive\net462".
15:14:41   VERBOSE: Performing the operation "Create Directory" on target "Destination: D:\logs\TeamCity\buildTransitive\net462".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.arm64.dll Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.arm64.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.arm64.pdb Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.arm64.pdb".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.targets Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.targets".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.x64.dll Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.x64.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.x64.pdb Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.x64.pdb".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.x86.dll Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.x86.dll".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\buildTransitive\net462\Microsoft.Data.SqlClien
15:14:41   t.SNI.x86.pdb Destination: D:\logs\TeamCity\buildTransitive\net462\Microsoft.Data.SqlClient.SNI.x86.pdb".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\.nupkg.metadata Destination:
15:14:41   D:\logs\TeamCity\.nupkg.metadata".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\.signature.p7s Destination:
15:14:41   D:\logs\TeamCity\.signature.p7s".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\dotnet.png Destination:
15:14:41   D:\logs\TeamCity\dotnet.png".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\LICENSE.txt Destination:
15:14:41   D:\logs\TeamCity\LICENSE.txt".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\microsoft.data.sqlclient.sni.5.2.0.nupkg
15:14:41   Destination: D:\logs\TeamCity\microsoft.data.sqlclient.sni.5.2.0.nupkg".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\microsoft.data.sqlclient.sni.5.2.0.nupkg.sha51
15:14:41   2 Destination: D:\logs\TeamCity\microsoft.data.sqlclient.sni.5.2.0.nupkg.sha512".
15:14:41   VERBOSE: Performing the operation "Copy File" on target "Item:
15:14:41   C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0\microsoft.data.sqlclient.sni.nuspec
15:14:41   Destination: D:\logs\TeamCity\microsoft.data.sqlclient.sni.nuspec".
15:14:41   Files have been copied from 'C:\Users\msaTeamCity$\.nuget\packages\microsoft.data.sqlclient.sni\5.2.0' to 'd:\logs\TeamCity'.
15:14:41   Errors:
15:14:41   Process exited with code 0

Visual Proof it successfully copied the whole folder:

Image

Image

@jzabroski
Copy link
Author

We plan to further rule out MSA Group Principals next week by changing the authentication mechanism back to NT AUTHORTY\System or a password-based account, just to briefly test. I don't know if MSBuild has any official recommendations on user access control privileges?

@maridematte
Copy link
Contributor

Team triage: Have you been able to verify that this behaviour is the same in other authentication mechanisms? In a failing case, what exactly have you seen in Process Monitor trace? Was there an IO failure somewhere?

@maridematte maridematte added the needs-more-info Issues that need more info to continue investigation. label Nov 19, 2024
@KalleOlaviNiemitalo
Copy link

Your screen shot shows the "Copying file from" message

<data name="Copy.FileComment">
<value>Copying file from "{0}" to "{1}".</value>
<comment>LOCALIZATION: {0} and {1} are paths.</comment>
</data>

which I think is logged from

msbuild/src/Tasks/Copy.cs

Lines 340 to 348 in 4a63064

// If the link was not created (either because the user didn't want one, or because it couldn't be created)
// then let's copy the file
if (!hardLinkCreated && !symbolicLinkCreated)
{
// Do not log a fake command line as well, as it's superfluous, and also potentially expensive
Log.LogMessage(MessageImportance.Normal, FileComment, sourceFileState.FileNameFullPath, destinationFileState.FileNameFullPath);
File.Copy(sourceFileState.Name, destinationFileState.Name, true);
}

where it is immediately followed by a File.Copy call. That looks pretty foolproof.

15:14:41 PowerShell Executable: C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe

That's Windows PowerShell on .NET Framework. Do you get the same result with PowerShell Core? (So that it uses .NET Runtime like dotnet build.)

Which version of Windows are you using, and are the files on NTFS or ReFS? If ReFS, then this might be a bug in Copy-on-Write.

@jzabroski
Copy link
Author

I'm going to follow up tomorrow. Internal deadline today. Excited to find the root cause.

@dotnet-policy-service dotnet-policy-service bot added author-responded Author responded, needs response from dev team. and removed triaged needs-more-info Issues that need more info to continue investigation. labels Nov 19, 2024
@jzabroski
Copy link
Author

jzabroski commented Nov 20, 2024

The problem appears to be a bug in MSBuild. I do not know why but ProcessMonitor shows no file attempt occurring. It also shows it under cmd.exe which is confusing as it is really dotnet msbuild that is running, but that is likely due to TeamCity launching dotnet through a cmd wrapper batch file.

Separately, I moved my PowerShell script to inside a build.targets file and it also worked when run directly inside msbuild, so it does not seem to be permissions related from the MSA Group Principal, although I dont understand why it works when I run it over PS Remoting to the build server. One thought I had is that PS Remoting is messing with the parallelism options in the Copy task, although I dont understand how that could be.

BinLog:
Image

Process Monitor:
Image

@jzabroski
Copy link
Author

I forgot to add, we are using NTFS, not ReFS. Definitely seems like an MSBuild bug unless I am misreading the logs somehow? Happy to repeat the test with further instruction on fine tuning using MSBuild and ProcessMonitor, but MSBuild verbosity is already set to diagnostic, and I captured the output in binlog.

@KalleOlaviNiemitalo
Copy link

It also shows it under cmd.exe which is confusing as it is really dotnet msbuild that is running, but that is likely due to TeamCity launching dotnet through a cmd wrapper batch file.

That's certainly strange. What kind of call stack does Process Monitor show for this access?

@jzabroski
Copy link
Author

jzabroski commented Nov 21, 2024

I don't know what you are asking. The screenshot clearly shows the native dlls are not even being copied. It looks as if somehow MSBuild is just dropping the copy operation altogether. What stack trace would exist for something that does not occur?

[Edit: If you look at the screenshot closely, where the yellow arrow is, I am highlighting where you would expect to see log messages in Process Monitor but do not see any for the affected dlls, Sorry if that was not clear]

One thing to note, in searching the dotnet/SqlClient issues on GitHub, at least one other user reported what I would call "MSBuild phantom writes" where the target did not actually copy anything. I need to find the exact issue as there are at least a dozen involving SNI.dll issues. It does make me think that some underlying problem has been going on for years.

@jzabroski
Copy link
Author

See: dotnet/SqlClient#2049 (comment) CC @Luxed

@KalleOlaviNiemitalo
Copy link

I mean, it's difficult for me to believe Process Monitor would show dotnet build file accesses as coming from cmd.exe. If you double-click one of those cmd.exe rows in Process Monitor, it should show the details of the event in a dialog box. From there, one of the tabs should show the call stack from which the I/O came. In the user-mode part of the stack trace, perhaps it'll show a function name that indicates what kind of command cmd.exe was running; if it's a DIR command, then that could explain why it's opening those files but neither reading nor writing them.

See: dotnet/SqlClient#2049 (comment)

I don't think it's the same problem. There, it says the item types had no items. In your case, the Copy task logged file names, so MSBuild must have given it items.

@jzabroski
Copy link
Author

jzabroski commented Nov 21, 2024

You are right. I took the wrong screnshot.

tldr; It appears dotnet.exe is deleting the file after it copies it. The culprit appears to be IncrementalClean task that does not add these files to the _CleanRemainingFileWritesAfterIncrementalClean item collection. I do not know much about how this works, but it looks like since these are not included in TaskRunner.csproj.FileListAbsolute, they get deleted.

How do I make this stop? Is it simply that the official guidance on how to package .NET Framework apps being followed by the Microsoft.Data.SqlClient team is incorrect for Microsoft.Data.SqlClient.SNI? I did point out to them that grpc.core package I use does not have this problem and is used in the same project, and it used a different design pattern, but they told me they're using the one guided by MSBuild.

long details below

I re-ran the process, pulled the timestamp from MSBuild:

Image

ProcessMonitor, filtered only on dotnet.exe, starting at 5:51:01.908233, just before MSBuild purports to have invoked CopySNIFiles target.

Image

Here is a screenshot showing the openresult was Created.

Image

It appears it was then deleted:

Image

In trying to pin down where this could occur, it appears there is an IncrementalClean task that runs after everything else. I honestly don't know what this task is and never knew it existed. I am also running msbuild with explicit --no-incremental, so I don't understand why this IncrementalClean thing is deleting my files. I would expect --no-incremental would tell it "assume this is a clean checkout."

@jzabroski
Copy link
Author

jzabroski commented Nov 21, 2024

I don't think it's the same problem.

I actually think it is the same problem, if you squint. I think there is some flaw in how IncrementalClean works on clean checkouts that causes issues for the guidance given to the Microsoft.Data.SqlClient team on how to package this. I think if they follow the guidance for ".NET Core", this problem likely goes away. The documentation is vague on why this approach exists uniquely for .NET Framework, so I dont know why there is separate guidance for NETFX.

The reason I was able to build this logged in as local administrator is I no longer had a clean checkout. TeamCity was checking out the code for me. Thats the key difference I missed and assumed was due to MSA Group Principal.

@KalleOlaviNiemitalo
Copy link

IIRC, the incremental cleaning works via the FileWrites item type. Targets should add output files to that. If the previous build added a file to FileWrites, but the current build does not, then the file is assumed to be no longer needed and is deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author-responded Author responded, needs response from dev team.
Projects
None yet
Development

No branches or pull requests

3 participants