[microsoft/playwright][BUG] 测试运行程序在 v1.35 回归运行期间挂起

系统信息

编剧版本：[v1.35.1]
操作系统：[macOS 13.4]
浏览器：[全部]
VS代码：1.79.2
节点：16.15.1

源代码

配置文件

import { defineConfig, devices } from "@playwright/test";
import dotenv from "dotenv";

/**
 * See https://playwright.dev/docs/test-configuration.
 */

/**
 * Read environment variables from file.
 * https://github.com/motdotla/dotenv
 */
dotenv.config();

export default defineConfig({
  testDir: "./tests",
  projects: [
    { name: "setup", testMatch: ["**/*.setup.ts"], teardown: "teardown" },
    { name: "teardown", testMatch: ["**/*.setup.ts"] },
    {
      name: "chromium",
      use: {
        ...devices["Desktop Chrome"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    },
    {
      name: "firefox",
      use: {
        ...devices["Desktop Firefox"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    },
    {
      name: "webkit",
      use: {
        ...devices["Desktop Safari"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    }
  ],
  /* Shared settings for all projects. See https://playwright.dev/docs/api/class-testoptions. */
  use: {
    /* Maximum time each action such as `click()` can take. Defaults to 0 (no limit). */
    actionTimeout: 0,
    /* Base URL to use in actions like `await page.goto('/')`. */
    baseURL: process.env.BASE_URL,
    trace: "retain-on-failure",
    headless: true
    /*launchOptions: {
      slowMo: 100 //Slow test execution down by 100ms, helpful for debugging
    }*/
  },
  /* Maximum time one test can run for. */
  timeout: 2 * 60 * 1000,
  globalTimeout: 25 * 60 * 1000,
  expect: {
    /**
     * Maximum time expect() should wait for the condition to be met.
     * For example in `await expect(locator).toHaveText();`
     */
    timeout: 5000
  },
  maxFailures: process.env.CI ? 16 : undefined,
  /* Run tests in files in parallel */
  fullyParallel: false,
  /* Fail the build on CI if you accidentally left test.only in the source code. */
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 3 : 1,
  /* Use 3 works locally. Use workers equal to 50% of available cores on CI - UNTESTED */
  workers: process.env.CI ? "50%" : 3,
  /* Reporters to use. See https://playwright.dev/docs/test-reporters */
  reporter: [
    ["html", { open: "never" }],
    ["./reporters/resultChecker"], //This outputs a file with the current branch and overall run result, to be consumed by GitHub Actions
    ["./reporters/failed-test-reporter.ts"], //This outputs a log of which tests have failed the last time they were executed, so just the failures can be rerun while debugging
    ["./reporters/better-dot-reporter.ts"]    
  ]
});

脚步

升级到 Playwright 1.35（之前使用 1.34）
通过 VSCode 终端运行我们的完整回归套件pnpx playwright test --project chromium（我们的是使用 Rush 管理的 monorepo，所以我们不使用npx）

预计测试运行将照常完成

实际测试运行中途停止，通常接近结束，并且没有完成。我们之前没有globalTimeout在配置中设置值，因此我添加了一个值来帮助尝试诊断这一问题。设置完成后，运行程序将在超过该值时中止（如上面的配置所示设置为 25 分钟，成功运行通常需要大约 10 分钟），但如果没有配置超时，我将不得不手动取消ctrl+C它一旦明显跑步者不再响应，VSCode 终端就会出现。

我知道这将是一个很难重现的问题，因为它似乎是我们套件中的某些特定内容。不幸的是，由于它包含有关我们应用程序的专有信息，我无法提供整个存储库。至少有两名团队成员（我和我们的一名开发人员）出现了该问题，因此在使用 1.35 时这似乎很常见。

我已经完成了 1.34 的成功运行和 1.35 的挂起/中止运行，均使用DEBUG=pw:api日志输出并将其保存到文件中。这些文件包含不记名令牌，因此出于安全原因，我宁愿不将它们附在此处，但如果它们可能有助于确定原因，我很乐意直接通过电子邮件发送给它们。

最近运行的点报告器输出因全局超时而中止：

Running 158 tests using 3 workers
··········································································°·····
···················°······°°······································°··°×T
Timed out waiting 1500s for the test suite to run

Timed out waiting 1500s for the teardown for start workers to run

  1) [chromium] › 04_managedentities.spec.ts:894:9 › Managed Entities › Create & manage new Legal Entity › ME15 - Deleting a business entity with ownership stake in another business entity @notvisual 

    "afterAll" hook timeout of 120000ms exceeded.

      969 |     });
      970 |
    > 971 |     test.afterAll(async () => {
          |          ^
      972 |       await deleteAllLegalEntities();
      973 |     });
      974 |   });

        at /Users/sarahwoodhouse/Documents/repos/core-platform/apps/gelt-frontend-autotests/tests/04_managedentities.spec.ts:971:10

    Retry #1 ───────────────────────────────────────────────────────────────────────────────────────

    "afterAll" hook timeout of 120000ms exceeded.

      969 |     });
      970 |
    > 971 |     test.afterAll(async () => {
          |          ^
      972 |       await deleteAllLegalEntities();
      973 |     });
      974 |   });

        at /Users/sarahwoodhouse/Documents/repos/core-platform/apps/gelt-frontend-autotests/tests/04_managedentities.spec.ts:971:10

  Slow test file: [chromium] › 01_taxprofile.spec.ts (4.7m)
  Slow test file: [chromium] › 03_docvault.spec.ts (3.7m)
  Slow test file: [chromium] › 05_assets.cryptoaccount.spec.ts (2.2m)
  Slow test file: [chromium] › 02_assets.privateinvestment.spec.ts (2.0m)
  Slow test file: [chromium] › 04_managedentities.spec.ts (1.4m)
  Consider splitting slow test files to speed up parallel execution
  1 failed
    [chromium] › 04_managedentities.spec.ts:894:9 › Managed Entities › Create & manage new Legal Entity › ME15 - Deleting a business entity with ownership stake in another business entity @notvisual 
  13 skipped
  144 passed (25.0m)
  2 errors were not a part of any test, see above for details

sarah-gelt

1.35 附带的较新浏览器很可能无法通过您的测试。您能否缩小受影响的测试和调用的范围？尝试记录跟踪以查看浏览器中发生了什么。

pavelfeldman

我已经能够在一个类中识别出四个测试似乎触发了这个问题。当这种情况发生时，根据 macOS 活动监视器，测试运行程序正在使用的节点进程的 CPU 使用率达到 100% 左右，并陷入困境。在测试运行因全局超时而中止后，我必须强制停止该过程，我已将其减少到 2 分钟，现在我可以通过一个测试重现这一点。

问题实际上似乎发生在钩子的末端afterAll。我已将控制台日志放入钩子中，包括块中的最后一行，并且所有日志都输出到终端。当运行程序在继续进行全局拆卸之前尝试关闭钩子时，它似乎是运行程序的内部内容。我的代码的最后一行 (a console.log()) 已afterAll运行，但运行程序永远不会到达拆卸中代码的第一行。

跑步者输出：


Running 3 tests using 1 worker
·
Timed out waiting 120s for the test suite to run

Timed out waiting 120s for the teardown for start workers to run

  2 skipped
  1 passed (2.0m)
  Timed out waiting 120s for the entire test run
  2 errors were not a part of any test, see above for details

该报告支持这一点，表明最终是测试失败的地方，但实际上显示测试的每一步都通过了。

附上报告，因为它太长，无法轻松截取所有afterAll步骤 Playwright Test Report.html.zip

特别奇怪的是，该类中的大多数测试都通过了，因此afterAll该类中的 14 个测试中有 10 个可以正确完成该钩子。

sarah-gelt

我再次记录了此特定测试的 DEBUG=pw:api 输出，在 1.35 中运行停止，在 1.34 中运行成功。如果有用的话，我很乐意通过电子邮件直接提供它们。

sarah-gelt

系统信息 Playwright 版本：[v1.35.1] 操作系统：[Ubuntu 22.04] 浏览器：[Chromium，无] VSCode：1.79.2 节点：16.19.0

我对 1.35.1 版本的 playwright 也有类似的问题，它会无限期地挂起，特别是如果在没有指定项目的情况下执行 playwright 时使用 chromium，即使测试主要是 api 和/或通过 ssh2 lib。

我注意到的是，在 ssh2.exec 命令完成后的某一时刻，即使等待完成，剧作家也会挂起。当测试在 playwright docker 中运行时，这种行为非常明显。

除了（这个挂点）之外，我无法让剧作家给出任何痕迹或错误消息：

message: '\x1BWorker teardown timeout of 30000ms exceeded while tearing down "trace recording".\x1B\n' +

[17:58:30 ](...
      '\x1BFailed worker ran 4 tests:\x1B\n' +

（剧作家 1.34.3 中不存在该问题）

剧作家.config.ts

import { devices, PlaywrightTestConfig } from '@playwright/test';

const config: PlaywrightTestConfig = {
  testDir: './tests',
  /* Maximum time one test can run for. */
  timeout: 30 * 1000,
  expect: {
    timeout: 50000
  },
  /* Run tests in files in parallel */
  fullyParallel: true,
  /* Fail the build on CI if you accidentally left test.only in the source code. */
  forbidOnly: !!process.env.CI,
  /* Retry on CI only */
  retries: process.env.CI ? 2 : 0,
  /* Opt out of parallel tests on CI. */
  workers: process.env.CI ? 1 : undefined,
  /* Reporter to use. See https://playwright.dev/docs/test-reporters */
  reporter: process.env.CI
    ? [
        [
          'playwright-teamcity-reporter',
          { testMetadataArtifacts: 'test-results', logConfig: false }
        ],
        ['html', { open: 'never' }]
      ]
    : [
        ['html', { open: 'never' }],
        ['list', { printSteps: true }]
      ],
  /* Shared settings for all the projects below.
    See https://playwright.dev/docs/api/class-testoptions. */
  globalSetup: require.resolve('./tests/global-setup'),
  globalTeardown: require.resolve('./tests/global-teardown'),
  use: {
    storageState: COOKIE_PATH,
    headless: true,
    actionTimeout: 0,
    trace: 'retain-on-failure',
    ignoreHTTPSErrors: true,
    baseURL: ...,
    extraHTTPHeaders: { Cookie: `id=${...}` },
    launchOptions: {
      slowMo: parseInt(process.env.PLAYWRIGHT_SLOW_MO || '0')
    }
  },

  projects: [
    {
      name: 'chromium',
      use: {
        ...devices['Desktop Chrome']
      }
    },
    {
      name: 'api',
      testDir: './tests/api',
      testMatch: /.\/tests\/api\/.*.spec.ts/,
      use: {
        headless: true,
        contextOptions: {
          ignoreHTTPSErrors: true
        }
      }
    },
    {
      name: 'ui',
      testDir: './tests/ui',
      testMatch: /.\/tests\/ui\/.*.spec.ts/,
      use: {
        ...devices['Desktop Chrome'],
      }
    },

  webServer:
    project !== 'api'
      ? [
          {
            command: 'yarn start',
            url: '...',
            ignoreHTTPSErrors: true,
            reuseExistingServer: true
          }
        ]
      : []
};

}

export default config;

alexanderdevm

我已经在 ssh lib 和 playwright api 上启用了调试，这就是 playwright 将无限期挂起的点。

Inbound: Received USERAUTH_FAILURE (publickey,password)
Client: none auth failed
Outbound: Sending USERAUTH_REQUEST (password)
Inbound: Received USERAUTH_SUCCESS
Outbound: Sending CHANNEL_OPEN (r:0, session)
Inbound: GLOBAL_REQUEST (hostkeys-00@openssh.com)
Outbound: Sending GLOBAL_REQUEST (hostkeys-prove-00@openssh.com)
Inbound: CHANNEL_OPEN_CONFIRMATION (r:0, s:0)
Outbound: Sending CHANNEL_REQUEST (r:0, exec: sudo -S -p "...")
Inbound: REQUEST_SUCCESS
Inbound: CHANNEL_WINDOW_ADJUST (r:0, 2097152)
Inbound: CHANNEL_SUCCESS (r:0)
Inbound: CHANNEL_EXTENDED_DATA (r:0, 28)
Outbound: Sending CHANNEL_DATA (r:0, 8)
Inbound: CHANNEL_EOF (r:0)
Inbound: CHANNEL_REQUEST (r:0, exit-status: 0)
Inbound: CHANNEL_CLOSE (r:0)
Outbound: Sending CHANNEL_CLOSE (r:0)
Outbound: Sending DISCONNECT (11)
Outbound: Sending DISCONNECT (11)
Socket ended
Socket closed
Socket ended
Socket closed
  pw:api => apiRequestContext.storageState started +17s
  pw:api <= apiRequestContext.storageState succeeded +2ms
  pw:api => apiRequestContext.get started +1ms
  pw:api → GET https://.../v2/inputs +4ms
  pw:api   user-agent: Playwright/1.35.1 (x64; ubuntu 22.04) node/16.19 +0ms
  pw:api   accept: application/json +0ms
  pw:api   accept-encoding: gzip,deflate,br +0ms
  pw:api   Cookie: id=... +0ms
  pw:api ← 200 OK +10ms
  pw:api   server: nginx +0ms
  pw:api   date: Sat, 08 Jul 2023 05:25:05 GMT +0ms
  pw:api   content-type: application/json +0ms
  pw:api   transfer-encoding: chunked +0ms
  pw:api   connection: close +0ms
  pw:api   vary: Accept-Encoding +0ms
  pw:api   cache-control: no-cache, no-store, must-revalidate +0ms
  pw:api   pragma: no-cache +0ms
  pw:api   expires: 0 +0ms
  pw:api   strict-transport-security: max-age=31536000; includeSubDomains +0ms
  pw:api   x-content-type-options: nosniff +0ms
  pw:api   x-frame-options: SAMEORIGIN +0ms
  pw:api   x-xss-protection: 1; mode=block +0ms
  pw:api   content-encoding: gzip +0ms
  pw:api <= apiRequestContext.get succeeded +3ms
  pw:api => apiResponse.text started +2ms
  pw:api <= apiResponse.text succeeded +1ms
  pw:api => apiResponse.json started +3ms
  pw:api <= apiResponse.json succeeded +1ms
  pw:api => tracing.stopChunk started +19ms
  pw:api => tracing.stopChunk started +1ms
  pw:api <= tracing.stopChunk succeeded +0ms
Worker teardown timeout of 30000ms exceeded while tearing down "trace recording".

alexanderdevm

我有完全相同的情况。即使 afterAll 挂钩为空（原文如此！），运行也会卡住。降级到 1.32.2 playwright 版本似乎不会出现此问题。

agataciesielska

嘿伙计！不幸的是，我们需要一个清晰的可重复性才能对此采取行动。由于这个问题很可能涉及一堆不同的内部部件，日志不足以让我们解决这个难题。

可以分享一个测试套件吗？如果您已经将其范围缩小到例如单个测试文件，您也可以与我私下共享。max schmitt @ microsoft.com。谢谢！

mxschmitt

@mxschmitt 虽然可以共享代码，但您是否需要测试套件处于运行状态？它运行的环境只是本地的。

alexanderdevm

是的，测试套件需要处于运行状态。理想情况下，我们运行npx playwright test并且它挂起。

mxschmitt

@mxschmitt 虽然可以共享代码，但您是否需要测试套件处于运行状态？它运行的环境只是本地的，这可能具有挑战性。

alexanderdevm

是的，它需要处于运行状态。您还可以尝试将测试用例减少到可以复制您的问题的较小测试用例，这样您就不需要共享/设置了。

mxschmitt

不幸的是，除非我们有良好且清晰的可重复性，否则这是不可行的。我现在将关闭此问题，但如果您有良好的重现，请不要犹豫，打开一个新问题。

谢谢！

aslushnikov

@aslushnikov，我没有可以分享的示例存储库，但是我们使用 SvelteKit 构建的设计系统相对简单，我们正面临这个问题。

我刚刚开了一个PR来重现这个问题。

您可以在那里找到 CI 结果：https://github.com/dfinity/gix-components/actions/runs/5645052302/job/15290064832? pr=257

如您所见，CI 永远挂起。请告诉我此信息是否有助于识别问题。

peterpeterparker

@peterpeterparker 我可以重现；这已通过https://github.com/microsoft/playwright/pull/24242修复在树尖上

该修复将在 Playwright 1.37 中上线（暂定于下周某个时间）。同时，您可以尝试使用@playwright/test@next，看看它是否适合您（对我有用！）。

aslushnikov

好消息，谢谢！我尝试重新创建一个较小的存储库来演示，但找不到神奇的配方。不幸的是，由于公司原因，我无法分享我们的整个存储库。

我今天晚些时候会尝试@next 看看？

sarah-gelt

谢谢@aslushnikov！可以确认修复。我在我的分支中升级了下一个版本，并且 CI 完成了（参考GitHub 操作）。

peterpeterparker

@aslushnikov 不幸的是，我这边没有好消息。我尝试过@next 1.37.0-alpha-aug-3-2023，但测试运行程序仍然挂起。与以前相同的问题，如果我注释掉afterAll受影响的类中的，那么跑步者就会完成。如果我将其留在其中（即使是一个空块，其所有内容都被注释掉），则运行程序会在运行完成之前挂起。

不幸的是，由于公司原因，我无法分享我的整个存储库，而且我真的很难从头开始创建一个基本的存储库来演示该问题。我可以做进一步的调试等来尝试找出问题吗？

sarah-gelt

不幸的是，我这边没有好消息。

@sarah-gelt 听到这个消息很难过。

我可以做进一步的调试等来尝试找出问题吗？

最好的方法是逐渐从测试和固定装置中删除代码，直到您弄清楚到底是什么导致测试运行程序挂起。

aslushnikov

@aslushnikov 经过一个漫长的下午的拆解，我想我找到了原因。这似乎与套件进行的 API 调用次数有关。每当我使用完一个实例时，APIRequestContext我现在都会打电话.dispose()，这已经为我解决了问题。也许内存中有太多的响应在运行结束时无法清除？这个特定的类通过 API 在系统中创建和删除记录以支持前端验证，因此似乎受到特别的影响。

sarah-gelt

[microsoft/playwright][BUG] 测试运行程序在 v1.35 回归运行期间挂起

回答