Docs sometimes do not save to Amazon S3

kklosson

A while back, I implemented a business object having a document attribute and I chose to use the Amazon S3 option. All worked well. Shortly after deployment, I started getting support tickets from some users reporting they were unable to upload documents. After selecting the file, the widget just reports "Failed to Upload". They report that multiple attempts were unsuccessful. Note that the files are limited to pdf files under 5 MB and those exceptions are caught before attempting to upload, so the files are fine. Further, when I have them send the file to me, I am always able to upload the documents successfully.

The documents to be uploaded replace the blank form that has been previously uploaded through another process, which copies the document from one BO to this one, so they are not always the same files. In some cases, those documents cannot be downloaded. Upon examination, the file exists on the S3 server but has zero bytes.

I have asked users who can't upload documents to describe their platform and OS, and it's just all over the place. I am unable to identify any hardware/software combo specific to the issue. I have NEVER been able to duplicate the issue on my test environment (My PC) or the server environment (AWS).

In summary:

The problem is intermittent, but I can't identify any situation to duplicate it.
Sometimes the server copies a document attribute from one BO to another, and the target file is created but has zero bytes. Attempts to open the document from the UI result in system error.
While some users are unable to upload the documents, it has never been an issue for me even when uploading the exact same file.
The error rate is probably around 30% for users uploading documents. Note that in this system, new users are always being processed, so this is a wide sample of disparate users.

I submitted this to Grok AI and here is the response: The TLDR is that I will start asking users about their security software.

_"Based on the symptoms you’ve described—intermittent upload failures, zero-byte files appearing in Amazon S3, and inconsistent behavior across users with no clear pattern tied to hardware or OS—there are several potential factors related to S3 and the Aware IM platform that could be contributing. Since you can’t replicate the issue in your test environment and the files themselves aren’t the problem (as you can upload them successfully), the root cause likely lies in the interaction between Aware IM, S3, and the users’ environments or network conditions. Here’s a breakdown of possible causes and steps to investigate further:

Potential Causes

Network Instability or Timeouts During Uploads
- S3 upload failures often stem from network issues, especially if the connection drops mid-transfer. If Aware IM or the S3 client library it uses doesn’t handle these interruptions robustly, the upload might appear to fail for the user, yet S3 could still register a zero-byte object if the request was partially processed (e.g., the HTTP PUT started but no data was sent).
- This could explain why you don’t see the issue: your network (and test environment’s) might be more stable than some users’. The 30% error rate suggests it’s not universal but common enough to point to variable network conditions.
S3 Presigned URL or Authentication Issues
- Aware IM likely uses presigned URLs or direct S3 API calls to upload files. If there’s a glitch in generating or handling these URLs—say, they expire too quickly or get corrupted—some users might fail to authenticate properly during the upload. This could result in a zero-byte file if the request reaches S3 but the payload doesn’t.
- The intermittent nature and your success with the same files suggest the issue isn’t the files or S3 itself but how the upload process is initiated or executed for certain users.
Aware IM Client-Side Processing
- The “Failed to Upload” message in the widget implies the failure is detected client-side. Aware IM might be preprocessing or streaming the file to S3, and if this process is interrupted (e.g., by browser limitations, memory constraints, or local resource issues), it could abort mid-upload. A zero-byte file might then get committed to S3 if the initial request succeeded but the data stream didn’t follow through.
- Since user platforms vary widely, differences in browser behavior, extensions, or local security settings could disrupt this process in ways you don’t encounter.
Copying Between Business Objects
- The zero-byte files from document attribute copying suggest a separate but related issue. If Aware IM copies files by referencing S3 objects (e.g., via CopyObject API) rather than re-uploading them, a failure in this operation—like a misconfigured IAM policy, versioning conflict, or S3 glitch—could leave an empty placeholder. This might tie back to S3’s eventual consistency model, where a copied object isn’t immediately available or fully propagated, leading to a botched transfer.
S3 Eventual Consistency
- S3 operates with eventual consistency for overwrites and deletes in most regions. If Aware IM replaces an existing file (e.g., the blank form) with a new upload, a race condition or timing issue could result in a zero-byte file if the system checks or commits the object before the upload fully propagates. This might not affect you if your environment has lower latency or better timing with AWS.
Concurrency or Rate Limiting
- With new users constantly joining, S3 or Aware IM might hit request rate limits or concurrency bottlenecks. If uploads or copy operations overlap, some could fail silently, leaving zero-byte files. This would be more pronounced for users with slower connections or higher latency to AWS.

Why It Affects Some Users and Not Others

Network Variability: Users with unstable or high-latency connections (e.g., rural areas, VPNs, or corporate firewalls) might experience more frequent timeouts or dropped requests.
Browser Differences: Aware IM’s upload widget might rely on HTML5 File APIs or JavaScript that behaves inconsistently across browsers or versions, especially under load or with certain extensions.
Regional AWS Latency: If your S3 bucket is in one region (e.g., us-east-1) and users are globally distributed, those farther away might face higher latency, increasing the chance of failures.
User-Specific Config: Local antivirus, proxies, or security software could interfere with the upload stream in ways your test setup doesn’t replicate.

Investigation and Mitigation Steps

Enable Detailed Logging
- Turn on S3 server access logging for your bucket to capture details about each upload and copy operation. Look for patterns in failed requests (e.g., 500 errors, incomplete PUTs) and correlate them with user reports.
- Check Aware IM’s logs (server-side and client-side, if available) for exceptions or timeouts during uploads or BO copying.
Test Network Conditions
- Simulate poor network conditions in your test environment using tools like tc (Linux) or browser throttling (e.g., Chrome DevTools). Try uploading with artificial latency or packet loss to see if you can replicate the issue.
Inspect Zero-Byte Files
- For zero-byte files from uploads or copies, check their metadata in S3 (e.g., via AWS Console or CLI). Look at timestamps, requester IPs, and headers like Content-Length to see if the request started but didn’t finish.
Validate Upload Process
- If possible, inspect Aware IM’s S3 integration code or consult its documentation/support to confirm how it handles uploads (e.g., multipart vs. single PUT, retry logic). Add a read-after-write check to verify file integrity post-upload.
Handle Copy Failures
- For the BO-to-BO copy issue, test if it’s an S3 CopyObject failure by manually copying objects in S3 and checking results. Ensure Aware IM retries failed copies and logs errors.
User Feedback Loop
- Ask affected users to provide browser console logs (F12) or network traces (e.g., HAR files) when the upload fails. This could reveal client-side errors or HTTP failures not visible server-side.
S3 Configuration
- Confirm your bucket’s region aligns with most users’ locations to minimize latency. Enable Transfer Acceleration if latency is a factor.
- Check IAM policies for Aware IM’s S3 access—ensure they allow PutObject and CopyObject without restrictive conditions that might fail intermittently.

Conclusion

Nothing inherent to S3 itself screams “user-specific failure” here—it’s more likely a combination of network variability, Aware IM’s upload/copy logic, and S3’s behavior under certain conditions (e.g., eventual consistency or partial failures). The zero-byte files suggest S3 is receiving incomplete requests, either from failed uploads or botched copies, while the 30% error rate points to environmental differences you haven’t replicated. Start with logging and network simulation to narrow it down, and consider reaching out to Aware IM support for platform-specific quirks with S3 integration."_