Skip to content

[SPARK-56407][BUILD][TESTS] Remove pre-built class files and JARs used in artifact transfer tests#55272

Closed
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:remove-test-jars-d
Closed

[SPARK-56407][BUILD][TESTS] Remove pre-built class files and JARs used in artifact transfer tests#55272
sarutak wants to merge 1 commit intoapache:masterfrom
sarutak:remove-test-jars-d

Conversation

@sarutak
Copy link
Copy Markdown
Member

@sarutak sarutak commented Apr 9, 2026

What changes were proposed in this pull request?

This PR is a part of SPARK-56352 for artifact transfer test files, replacing pre-built .class files, JAR files, CRC files, and serialized binaries with dynamic generation at test time, removing 20 binary/text files from the repository.

Changes:

  • Update ArtifactManagerSuite (sql/core) to dynamically compile Java and Scala source files into .class files and JARs at test time using createJarWithJavaSources() and createJarWithScalaSources().
  • Update ArtifactSuite (sql/connect/client) to generate test artifacts(smallClassFile.class, smallJar.jar, largeJar.jar, etc.) in a temp directory and compute CRC values dynamically using java.util.zip.CRC32.
  • Update ClassFinderSuite (sql/connect/client) to generate dummy .class files in a temp directory instead of reading from pre-built resources.
  • Update AddArtifactsHandlerSuite (sql/connect/server) to generate test artifacts and CRC files dynamically in a temp directory.
  • Update SparkConnectClientSuite (sql/connect/client) to use a dynamically generated temp file instead of referencing the deleted artifact-tests directory.
  • Update test_artifact.py (PySpark) to generate test JAR files in a temp directory and compute CRC values dynamically using zlib.crc32.
  • Empty dev/test-jars.txt and dev/test-classes.txt as all listed files are now removed.

Note on test artifacts: The artifact transfer tests (ArtifactSuite, AddArtifactsHandlerSuite, test_artifact.py) only verify byte-level transfer protocol (chunking, CRC), so the generated .class files and JAR entries contain arbitrary bytes rather than valid Java class files. In contrast, ArtifactManagerSuite requires valid class files for classloader testing and uses createJarWithJavaSources()/createJarWithScalaSources()` accordingly.

Files removed:

  • data/artifact-tests/junitLargeJar.jar
  • data/artifact-tests/smallJar.jar
  • data/artifact-tests/crc/ (3 files: README.md, junitLargeJar.txt, smallJar.txt)
  • sql/connect/common/src/test/resources/artifact-tests/ (11 files: Hello.class, smallClassFile.class, smallClassFileDup.class, smallJar.jar, junitLargeJar.jar, crc/*.txt)
  • sql/core/src/test/resources/artifact-tests/ (4 .class files: Hello.class, HelloWithPackage.class, IntSumUdf.class, smallClassFile.class)

Why are the changes needed?

As noted in the PR discussion (#50378):

the ultimate goal is to refactor the tests to automatically build the jars instead of using pre-built ones

This PR completes that goal by removing all remaining pre-built test artifacts. After this change, no binary artifacts remain in the source tree for test purposes, and the release-time workaround (SPARK-51318) becomes fully unnecessary.

Note: dev/test-jars.txt and dev/test-classes.txt are left as empty files because dev/create-release/release-tag.sh reads them with rm $(<dev/test-jars.txt), which would fail if the files were deleted. A follow-up PR will update the release script and remove these files.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • ArtifactManagerSuite (sql/core)
  • ArtifactSuite, ClassFinderSuite, SparkConnectClientSuite (sql/connect/client)
  • AddArtifactsHandlerSuite (sql/connect/server) — fails identically on origin/master when run standalone via SBT due to a pre-existing session initialization issue unrelated to this change
  • test_artifact.py (PySpark)

Was this patch authored or co-authored using generative AI tooling?

Kiro CLI / Opus 4.6

Remove 2 pre-built JAR files (junitLargeJar.jar, smallJar.jar), 7 class files
(Hello.class, HelloWithPackage.class, IntSumUdf.class, smallClassFile.class,
smallClassFileDup.class), udf_noA.jar, and CRC files by dynamically generating
them at test time.

ArtifactManagerSuite generates real compiled classes (Hello, IntSumUdf via Scala,
HelloWithPackage via Java). AddArtifactsHandlerSuite generates dummy byte arrays
and computes CRC dynamically.
@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

@HyukjinKwon
Copy link
Copy Markdown
Member

Just quick note in case it misses out, once this is done, we should also revert e4eded8 and 15eaa52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants