Commit Graph

297 Commits (afff3d41cb3b7adc06d14ac5322392bcec16afa9)

Author SHA1 Message Date
Claude f9f71c9f11 src: only prune if buildkitd was spun up 2025-06-17 14:36:14 -04:00
Claude a037e6f634 src: use BLACKSMITH prefixed VM ID env var 2025-06-16 16:12:28 -04:00
Claude 616bee01ad test: fix platform test to work on both ARM and AMD runners
The test was hardcoded to expect arm64 platform, causing failures
on AMD runners. Now checks actual host architecture dynamically.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-11 13:54:31 -04:00
Claude 28c244705c *: allow users to pass in a buildx version 2025-05-30 12:31:51 -04:00
Claude 9dbab7fbd2 src: add a retry with backoff to combat 429s when downloading buildkit 2025-05-18 16:22:27 -04:00
Claude 1868624b97 src: add ping before get stickydisk 2025-05-16 13:41:46 -04:00
Claude e84bc1a88e src: more debug logs 2025-05-14 14:04:56 -04:00
Claude 41a36ac067 src: print the port bpa is trying to hit 2025-05-14 13:43:57 -04:00
Claude 296109dd1e src: only commit stickydisk in post step if in setup-only
Firstly this was a bug where we were trying to commit in the post
step even if we had already committed at the end of the main step in
a non-setup-only invocation.

Secondly, if the action is canceled before the exposeID is set in the main
process, we don't want to send a commit request with an empty exposeID.
2025-04-29 17:01:42 -04:00
Claude c80185915d src: move buildkit prune to cleanup stage and invoke it inline
Previously, we were firing off an async buildkit prune to clean
up layers unused in 14 days. This changes that to cleanup layers
unused in 7 days and fires it off inline on cleanup. It just seems
easier to reason about that way.
2025-04-22 16:31:23 -04:00
Claude 11ec21ffed src: use port from env 2025-04-15 18:23:28 -07:00
Claude ab514e31b5 *: introduce a setup-only mode to the build-push-action
This setup-only mode will setup a docker builder with the stickydisk
mounted but will not run a Docker build. The use case here is to allow
customers to then run their custom Tilt files or Docker commands against
our builder. The other subtle change is that we only cleanup in the post
step of this builder action. It is still to be seen if you can start several
of these builders at the same time in a workflow but we can do that as a follow
on.
2025-04-14 16:36:36 -07:00
Aayush Shah f8d1c2e2ae
*: normalize file paths in all cases (#104) 2025-03-06 17:24:56 -05:00
Aditya Maru 6fd13769ac src: disable native multi-arch builds 2025-03-04 15:53:15 -05:00
Aditya Maru feb3751245 src: only log fatal errors in tailscale teardown 2025-03-03 22:55:54 -05:00
Aditya Maru 4a3e86e9c9 src: add scaffolding for support multi-platform builds 2025-02-17 05:25:52 +05:30
Aayush 1390f95565 *: bind to localhost over TCP instead of using a unix socket 2025-02-10 23:06:21 -05:00
Aditya Maru 2331ad873b src: add sync before umount 2025-01-21 19:34:23 -05:00
Aditya Maru f440133b20 wip 2025-01-10 15:52:55 -05:00
Aayush 8554acbf59
src: prevent path duplication when dockerfile is within context 2025-01-09 10:03:58 -05:00
Aditya Maru 5ac445ae84 src: fix error message 2025-01-08 07:14:25 -05:00
Aayush 0e4788906e
src: bump buildkit startup timeout to 30sec 2025-01-07 21:18:32 -05:00
Aayush Shah d8a061af73
src: update timeout on `setupStickyDisk` (#91) 2025-01-01 15:09:21 -05:00
Aditya Maru 34ea2f79e5 src: change warning to debug 2025-01-01 13:16:46 +04:00
Aayush Shah 4ed3ba5c73
src: ignore unset sentinel value for tailscale token (#89) 2025-01-01 02:05:30 -05:00
Aditya Maru 42b59d67c9 src: bump timeout from 30s to 45s 2025-01-01 09:25:31 +04:00
Aayush Shah c03b613806
use local dockerfile path over git context (#86) 2024-12-31 13:08:49 -05:00
Aditya Maru aa6b213b0b src: join and leave tailnet on start and cleanup of builder 2024-12-31 15:52:49 +04:00
Aditya Maru 9fdeb57c53 src: disable automatic buildkit GC
We have reason to believe that automatic GC is affecting
daemon startup times. In this patch we disable automatic GC
and instead rely on manual pruning of the buildkit cache.
Once the daemon is ready we spawn an async task to run prune
on any objects older than 14 days. We are already manaing the
ceph volume approaching its size limit ourselves in the VM
Agent.

Patch also adds some alerting when inode usage is high on a mountpoint.
2024-12-23 09:15:34 -05:00
Aditya Maru 61713d1849 src: print api url in debug info 2024-12-21 23:42:52 -05:00
Aditya Maru 6fe2467492 src: silence metric warnings for now 2024-12-21 23:12:08 -05:00
Aditya Maru 4759d93c12 src: use the plumbed BLACKSMITH_BACKEND_URL if present 2024-12-21 12:08:11 -05:00
Aditya Maru def1585067 *: report metrics to the VM agent 2024-12-20 17:43:40 -05:00
Aditya Maru 4723a2a346 src: stop spurious warnings on buildkit shutdown 2024-12-19 19:04:07 -05:00
Aditya Maru 1672d6fbad src: fix shutdown retry behavior 2024-12-19 13:04:09 -05:00
Aditya Maru 9302d2aea9 src: stop running process as nohup to avoid missing logs 2024-12-19 12:44:35 -05:00
Aditya Maru ac42783fa9 src: cleanup flakiness in different parts of the action 2024-12-18 09:58:15 -05:00
Aditya Maru 54bc4e0788 src: refactor cleanup logic to expose buildkitd.log
Previosuly, we only killed the buildkitd process and unmounted
if builderInfo was non null. This was wrong cause we could have setup
builkdkitd, but failed after that step. This would then rely on the last
ditch effort by the post action to cleanup. We now change the proc kill
and unmount to happen on any build error.
2024-12-16 19:25:47 -05:00
Aditya Maru d43ee61bb7 *: move to grpc backed communication for the agent 2024-12-16 15:29:30 -05:00
Aditya Maru 53000f0f59 ignore error when nothing is mounted 2024-12-15 17:16:24 -05:00
Aditya Maru 1df1b3c361 src: ignore error when theres nothing mounted 2024-12-13 12:32:05 -05:00
Aditya Maru de0451e517 src: make post unmount even if buildkitd is no longer present
Also increase retries when trying to unmount the buildkit directory.
Retry up to 3 seconds now, previously we were only retrying 3 times
with a 100ms backoff.
2024-12-10 21:26:18 -05:00
Aditya Maru 0f99a0b1c7 src: start sending get request with query params
We are incorrectly using formData in a get request. To move
away from this we send both query params and formData until
the server is fully upgraded. After which we can stop sending
formData.
2024-12-09 13:01:35 -05:00
Aditya Maru 0186286e06 *: use axios-retry instead of handrolled retry methods 2024-12-09 13:01:20 -05:00
Aayush Shah 7b8642822f
src: make `getDockerfilePath` return the full path to the dockerfile (#64)
Previously we were just returning the path to the dir containing the dockerfile
in most cases.
2024-12-09 12:20:46 -05:00
Aditya Maru f06a558c36 src: alert if an exception is thrown on cleanup 2024-12-08 19:21:46 -05:00
Aditya Maru b76cd7bf3b src: fix bug in conditional that zero'd out expose ID 2024-12-08 18:44:36 -05:00
Aayush f9d1e150a9
*: unify error handling and add more unit tests 2024-12-08 16:41:30 -05:00
Aditya Maru c71ad2dbef *: refactor methods to support mocking
Additionally, write some tests to ensure the driver method
`startBlacksmithBuilder` handles all exceptions correctly in
both nofallback=true and nofallback=false configurations.
2024-12-08 14:35:30 -05:00
Aditya Maru 5ab78173d3 backend: default to /dev/vdb when ExposeVolume response is empty 2024-12-06 22:39:20 -05:00