[microsoft/playwright][BUG] 测试运行程序在 v1.35 回归运行期间挂起

2024-04-09 46 views
8
系统信息
  • 编剧版本:[v1.35.1]
  • 操作系统:[macOS 13.4]
  • 浏览器:[全部]
  • VS代码:1.79.2
  • 节点:16.15.1
源代码

配置文件

import { defineConfig, devices } from "@playwright/test";
import dotenv from "dotenv";

/**
 * See https://playwright.dev/docs/test-configuration.
 */

/**
 * Read environment variables from file.
 * https://github.com/motdotla/dotenv
 */
dotenv.config();

export default defineConfig({
  testDir: "./tests",
  projects: [
    { name: "setup", testMatch: ["**/*.setup.ts"], teardown: "teardown" },
    { name: "teardown", testMatch: ["**/*.setup.ts"] },
    {
      name: "chromium",
      use: {
        ...devices["Desktop Chrome"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    },
    {
      name: "firefox",
      use: {
        ...devices["Desktop Firefox"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    },
    {
      name: "webkit",
      use: {
        ...devices["Desktop Safari"],
        viewport: { width: 1280, height: 768 }
      },
      testMatch: ["**/*.spec.ts"],
      dependencies: ["setup"]
    }
  ],
  /* Shared settings for all projects. See https://playwright.dev/docs/api/class-testoptions. */
  use: {
    /* Maximum time each action such as `click()` can take. Defaults to 0 (no limit). */
    actionTimeout: 0,
    /* Base URL to use in actions like `await page.goto('/')`. */
    baseURL: process.env.BASE_URL,
    trace: "retain-on-failure",
    headless: true
    /*launchOptions: {
      slowMo: 100 //Slow test execution down by 100ms, helpful for debugging
    }*/
  },
  /* Maximum time one test can run for. */
  timeout: 2 * 60 * 1000,
  globalTimeout: 25 * 60 * 1000,
  expect: {
    /**
     * Maximum time expect() should wait for the condition to be met.
     * For example in `await expect(locator).toHaveText();`
     */
    timeout: 5000
  },
  maxFailures: process.env.CI ? 16 : undefined,
  /* Run tests in files in parallel */
  fullyParallel: false,
  /* Fail the build on CI if you accidentally left test.only in the source code. */
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 3 : 1,
  /* Use 3 works locally. Use workers equal to 50% of available cores on CI - UNTESTED */
  workers: process.env.CI ? "50%" : 3,
  /* Reporters to use. See https://playwright.dev/docs/test-reporters */
  reporter: [
    ["html", { open: "never" }],
    ["./reporters/resultChecker"], //This outputs a file with the current branch and overall run result, to be consumed by GitHub Actions
    ["./reporters/failed-test-reporter.ts"], //This outputs a log of which tests have failed the last time they were executed, so just the failures can be rerun while debugging
    ["./reporters/better-dot-reporter.ts"]    
  ]
});

脚步

  • 升级到 Playwright 1.35(之前使用 1.34)
  • 通过 VSCode 终端运行我们的完整回归套件pnpx playwright test --project chromium(我们的是使用 Rush 管理的 monorepo,所以我们不使用npx

预计 测试运行将照常完成

实际 测试运行中途停止,通常接近结束,并且没有完成。我们之前没有globalTimeout在配置中设置值,因此我添加了一个值来帮助尝试诊断这一问题。设置完成后,运行程序将在超过该值时中止(如上面的配置所示设置为 25 分钟,成功运行通常需要大约 10 分钟),但如果没有配置超时,我将不得不手动取消ctrl+C它一旦明显跑步者不再响应,VSCode 终端就会出现。

我知道这将是一个很难重现的问题,因为它似乎是我们套件中的某些特定内容。不幸的是,由于它包含有关我们应用程序的专有信息,我无法提供整个存储库。至少有两名团队成员(我和我们的一名开发人员)出现了该问题,因此在使用 1.35 时这似乎很常见。

我已经完成了 1.34 的成功运行和 1.35 的挂起/中止运行,均使用DEBUG=pw:api日志输出并将其保存到文件中。这些文件包含不记名令牌,因此出于安全原因,我宁愿不将它们附在此处,但如果它们可能有助于确定原因,我很乐意直接通过电子邮件发送给它们。

最近运行的点报告器输出因全局超时而中止:

Running 158 tests using 3 workers
··········································································°·····
···················°······°°······································°··°×T
Timed out waiting 1500s for the test suite to run

Timed out waiting 1500s for the teardown for start workers to run

  1) [chromium] › 04_managedentities.spec.ts:894:9 › Managed Entities › Create & manage new Legal Entity › ME15 - Deleting a business entity with ownership stake in another business entity @notvisual 

    "afterAll" hook timeout of 120000ms exceeded.

      969 |     });
      970 |
    > 971 |     test.afterAll(async () => {
          |          ^
      972 |       await deleteAllLegalEntities();
      973 |     });
      974 |   });

        at /Users/sarahwoodhouse/Documents/repos/core-platform/apps/gelt-frontend-autotests/tests/04_managedentities.spec.ts:971:10

    Retry #1 ───────────────────────────────────────────────────────────────────────────────────────

    "afterAll" hook timeout of 120000ms exceeded.

      969 |     });
      970 |
    > 971 |     test.afterAll(async () => {
          |          ^
      972 |       await deleteAllLegalEntities();
      973 |     });
      974 |   });

        at /Users/sarahwoodhouse/Documents/repos/core-platform/apps/gelt-frontend-autotests/tests/04_managedentities.spec.ts:971:10

  Slow test file: [chromium] › 01_taxprofile.spec.ts (4.7m)
  Slow test file: [chromium] › 03_docvault.spec.ts (3.7m)
  Slow test file: [chromium] › 05_assets.cryptoaccount.spec.ts (2.2m)
  Slow test file: [chromium] › 02_assets.privateinvestment.spec.ts (2.0m)
  Slow test file: [chromium] › 04_managedentities.spec.ts (1.4m)
  Consider splitting slow test files to speed up parallel execution
  1 failed
    [chromium] › 04_managedentities.spec.ts:894:9 › Managed Entities › Create & manage new Legal Entity › ME15 - Deleting a business entity with ownership stake in another business entity @notvisual 
  13 skipped
  144 passed (25.0m)
  2 errors were not a part of any test, see above for details

回答

2

1.35 附带的较新浏览器很可能无法通过您的测试。您能否缩小受影响的测试和调用的范围?尝试记录跟踪以查看浏览器中发生了什么。

6

我已经能够在一个类中识别出四个测试似乎触发了这个问题。当这种情况发生时,根据 macOS 活动监视器,测试运行程序正在使用的节点进程的 CPU 使用率达到 100% 左右,并陷入困境。在测试运行因全局超时而中止后,我必须强制停止该过程,我已将其减少到 2 分钟,现在我可以通过一个测试重现这一点。

问题实际上似乎发生在钩子的末端afterAll。我已将控制台日志放入钩子中,包括块中的最后一行,并且所有日志都输出到终端。当运行程序在继续进行全局拆卸之前尝试关闭钩子时,它似乎是运行程序的内部内容。我的代码的最后一行 (a console.log()) 已afterAll运行,但运行程序永远不会到达拆卸中代码的第一行。

跑步者输出:


Running 3 tests using 1 worker
·
Timed out waiting 120s for the test suite to run

Timed out waiting 120s for the teardown for start workers to run

  2 skipped
  1 passed (2.0m)
  Timed out waiting 120s for the entire test run
  2 errors were not a part of any test, see above for details

该报告支持这一点,表明最终是测试失败的地方,但实际上显示测试的每一步都通过了。

图像

附上报告,因为它太长,无法轻松截取所有afterAll步骤 Playwright Test Report.html.zip

特别奇怪的是,该类中的大多数测试都通过了,因此afterAll该类中的 14 个测试中有 10 个可以正确完成该钩子。

9

我再次记录了此特定测试的 DEBUG=pw:api 输出,在 1.35 中运行停止,在 1.34 中运行成功。如果有用的话,我很乐意通过电子邮件直接提供它们。

6

系统信息 Playwright 版本:[v1.35.1] 操作系统:[Ubuntu 22.04] 浏览器:[Chromium,无] VSCode:1.79.2 节点:16.19.0

我对 1.35.1 版本的 playwright 也有类似的问题,它会无限期地挂起,特别是如果在没有指定项目的情况下执行 playwright 时使用 chromium,即使测试主要是 api 和/或通过 ssh2 lib。

我注意到的是,在 ssh2.exec 命令完成后的某一时刻,即使等待完成,剧作家也会挂起。当测试在 playwright docker 中运行时,这种行为非常明显。

除了(这个挂点)之外,我无法让剧作家给出任何痕迹或错误消息:

message: '\x1BWorker teardown timeout of 30000ms exceeded while tearing down "trace recording".\x1B\n' +

[17:58:30 ](...
      '\x1BFailed worker ran 4 tests:\x1B\n' +

(剧作家 1.34.3 中不存在该问题)

剧作家.config.ts

import { devices, PlaywrightTestConfig } from '@playwright/test';

const config: PlaywrightTestConfig = {
  testDir: './tests',
  /* Maximum time one test can run for. */
  timeout: 30 * 1000,
  expect: {
    timeout: 50000
  },
  /* Run tests in files in parallel */
  fullyParallel: true,
  /* Fail the build on CI if you accidentally left test.only in the source code. */
  forbidOnly: !!process.env.CI,
  /* Retry on CI only */
  retries: process.env.CI ? 2 : 0,
  /* Opt out of parallel tests on CI. */
  workers: process.env.CI ? 1 : undefined,
  /* Reporter to use. See https://playwright.dev/docs/test-reporters */
  reporter: process.env.CI
    ? [
        [
          'playwright-teamcity-reporter',
          { testMetadataArtifacts: 'test-results', logConfig: false }
        ],
        ['html', { open: 'never' }]
      ]
    : [
        ['html', { open: 'never' }],
        ['list', { printSteps: true }]
      ],
  /* Shared settings for all the projects below.
    See https://playwright.dev/docs/api/class-testoptions. */
  globalSetup: require.resolve('./tests/global-setup'),
  globalTeardown: require.resolve('./tests/global-teardown'),
  use: {
    storageState: COOKIE_PATH,
    headless: true,
    actionTimeout: 0,
    trace: 'retain-on-failure',
    ignoreHTTPSErrors: true,
    baseURL: ...,
    extraHTTPHeaders: { Cookie: `id=${...}` },
    launchOptions: {
      slowMo: parseInt(process.env.PLAYWRIGHT_SLOW_MO || '0')
    }
  },

  projects: [
    {
      name: 'chromium',
      use: {
        ...devices['Desktop Chrome']
      }
    },
    {
      name: 'api',
      testDir: './tests/api',
      testMatch: /.\/tests\/api\/.*.spec.ts/,
      use: {
        headless: true,
        contextOptions: {
          ignoreHTTPSErrors: true
        }
      }
    },
    {
      name: 'ui',
      testDir: './tests/ui',
      testMatch: /.\/tests\/ui\/.*.spec.ts/,
      use: {
        ...devices['Desktop Chrome'],
      }
    },

  webServer:
    project !== 'api'
      ? [
          {
            command: 'yarn start',
            url: '...',
            ignoreHTTPSErrors: true,
            reuseExistingServer: true
          }
        ]
      : []
};

}

export default config;
7

我已经在 ssh lib 和 playwright api 上启用了调试,这就是 playwright 将无限期挂起的点。

Inbound: Received USERAUTH_FAILURE (publickey,password)
Client: none auth failed
Outbound: Sending USERAUTH_REQUEST (password)
Inbound: Received USERAUTH_SUCCESS
Outbound: Sending CHANNEL_OPEN (r:0, session)
Inbound: GLOBAL_REQUEST (hostkeys-00@openssh.com)
Outbound: Sending GLOBAL_REQUEST (hostkeys-prove-00@openssh.com)
Inbound: CHANNEL_OPEN_CONFIRMATION (r:0, s:0)
Outbound: Sending CHANNEL_REQUEST (r:0, exec: sudo -S -p "...")
Inbound: REQUEST_SUCCESS
Inbound: CHANNEL_WINDOW_ADJUST (r:0, 2097152)
Inbound: CHANNEL_SUCCESS (r:0)
Inbound: CHANNEL_EXTENDED_DATA (r:0, 28)
Outbound: Sending CHANNEL_DATA (r:0, 8)
Inbound: CHANNEL_EOF (r:0)
Inbound: CHANNEL_REQUEST (r:0, exit-status: 0)
Inbound: CHANNEL_CLOSE (r:0)
Outbound: Sending CHANNEL_CLOSE (r:0)
Outbound: Sending DISCONNECT (11)
Outbound: Sending DISCONNECT (11)
Socket ended
Socket closed
Socket ended
Socket closed
  pw:api => apiRequestContext.storageState started +17s
  pw:api <= apiRequestContext.storageState succeeded +2ms
  pw:api => apiRequestContext.get started +1ms
  pw:api → GET https://.../v2/inputs +4ms
  pw:api   user-agent: Playwright/1.35.1 (x64; ubuntu 22.04) node/16.19 +0ms
  pw:api   accept: application/json +0ms
  pw:api   accept-encoding: gzip,deflate,br +0ms
  pw:api   Cookie: id=... +0ms
  pw:api ← 200 OK +10ms
  pw:api   server: nginx +0ms
  pw:api   date: Sat, 08 Jul 2023 05:25:05 GMT +0ms
  pw:api   content-type: application/json +0ms
  pw:api   transfer-encoding: chunked +0ms
  pw:api   connection: close +0ms
  pw:api   vary: Accept-Encoding +0ms
  pw:api   cache-control: no-cache, no-store, must-revalidate +0ms
  pw:api   pragma: no-cache +0ms
  pw:api   expires: 0 +0ms
  pw:api   strict-transport-security: max-age=31536000; includeSubDomains +0ms
  pw:api   x-content-type-options: nosniff +0ms
  pw:api   x-frame-options: SAMEORIGIN +0ms
  pw:api   x-xss-protection: 1; mode=block +0ms
  pw:api   content-encoding: gzip +0ms
  pw:api <= apiRequestContext.get succeeded +3ms
  pw:api => apiResponse.text started +2ms
  pw:api <= apiResponse.text succeeded +1ms
  pw:api => apiResponse.json started +3ms
  pw:api <= apiResponse.json succeeded +1ms
  pw:api => tracing.stopChunk started +19ms
  pw:api => tracing.stopChunk started +1ms
  pw:api <= tracing.stopChunk succeeded +0ms
Worker teardown timeout of 30000ms exceeded while tearing down "trace recording".
7

我有完全相同的情况。即使 afterAll 挂钩为空(原文如此!),运行也会卡住。 图像 降级到 1.32.2 playwright 版本似乎不会出现此问题。

2

嘿伙计!不幸的是,我们需要一个清晰的可重复性才能对此采取行动。由于这个问题很可能涉及一堆不同的内部部件,日志不足以让我们解决这个难题。

可以分享一个测试套件吗?如果您已经将其范围缩小到例如单个测试文件,您也可以与我私下共享。max schmitt @ microsoft.com。谢谢!

3

@mxschmitt 虽然可以共享代码,但您是否需要测试套件处于运行状态?它运行的环境只是本地的。

1

是的,测试套件需要处于运行状态。理想情况下,我们运行npx playwright test并且它挂起。

4

@mxschmitt 虽然可以共享代码,但您是否需要测试套件处于运行状态?它运行的环境只是本地的,这可能具有挑战性。

2

是的,它需要处于运行状态。您还可以尝试将测试用例减少到可以复制您的问题的较小测试用例,这样您就不需要共享/设置了。

4

不幸的是,除非我们有良好且清晰的可重复性,否则这是不可行的。我现在将关闭此问题,但如果您有良好的重现,请不要犹豫,打开一个新问题。

谢谢!

5

@peterpeterparker 我可以重现;这已通过https://github.com/microsoft/playwright/pull/24242修复在树尖上

该修复将在 Playwright 1.37 中上线(暂定于下周某个时间)。同时,您可以尝试使用@playwright/test@next,看看它是否适合您(对我有用!)。

9

好消息,谢谢!我尝试重新创建一个较小的存储库来演示,但找不到神奇的配方。不幸的是,由于公司原因,我无法分享我们的整个存储库。

我今天晚些时候会尝试@next 看看?

5

谢谢@aslushnikov!可以确认修复。我在我的分支中升级了下一个版本,并且 CI 完成了(参考GitHub 操作)。

5

@aslushnikov 不幸的是,我这边没有好消息。我尝试过@next 1.37.0-alpha-aug-3-2023,但测试运行程序仍然挂起。与以前相同的问题,如果我注释掉afterAll受影响的类中的 ,那么跑步者就会完成。如果我将其留在其中(即使是一个空块,其所有内容都被注释掉),则运行程序会在运行完成之前挂起。

不幸的是,由于公司原因,我无法分享我的整个存储库,而且我真的很难从头开始创建一个基本的存储库来演示该问题。我可以做进一步的调试等来尝试找出问题吗?

1

不幸的是,我这边没有好消息。

@sarah-gelt 听到这个消息很难过。

我可以做进一步的调试等来尝试找出问题吗?

最好的方法是逐渐从测试和固定装置中删除代码,直到您弄清楚到底是什么导致测试运行程序挂起。

6

@aslushnikov 经过一个漫长的下午的拆解,我想我找到了原因。这似乎与套件进行的 API 调用次数有关。每当我使用完一个实例时,APIRequestContext我现在都会打电话.dispose(),这已经为我解决了问题。也许内存中有太多的响应在运行结束时无法清除?这个特定的类通过 API 在系统中创建和删除记录以支持前端验证,因此似乎受到特别的影响。