Another things, I keeps getting the following errors which causing data loss while updating or inserting from my pro plan. Tell me if you need other info. It’s urgent
Can you help with the following stacks ?
Unexpected Error from the database error code “XX000”
I understand you are encountering some errors. Thank you for providing those details. Can you share the name of the affected database (you can send me a private message if you’d prefer not to share in this thread) and any details about when these issues started and how frequently they occur? Also, how are you connecting to bit.io (which client/language)? Are you are running queries on a strict schedule (e.g. every X minutes)?
After some further investigation, we have identified an issue we believe was responsible for the recent connection issues you have been observing. We implemented a fix this morning that we hope will ensure your connections now work as expected. That said, we are continuing to monitor the situation. If you encounter the issue again, please let us know and we will investigate further.
Thanks for letting us know. We will continue investigating. I have reached out to you directly to get the database name. In the meantime, can you provide any details about when these issues started and how frequently they occur? Also, how are you connecting to bit.io (which client/language)? Are you are running queries on a strict schedule (e.g. every X minutes)?
We’ve deployed several performance tweaks to reduce the disconnect errors and have noted a marked decrease in the type of error you’re receiving starting early this afternoon. Unfortunately, it’s not just a simple bug fix – there is network latency, retries, and locking all in the mix. So, we’re going to continue monitoring and tweaking over the next week. Are you still receiving errors?
Those are not XX000. We’ve seen a drastic drop in XX000 over the last 24 hours after deploying some changes. I have no seen any on your database. Please let us know if you still see those.
Database shutting down messages are related to The Connection Lifecycle. You’re likely connecting at the the exact moment of a cooldown. If you have questions about that behavior, please start another thread.
We will continue to monitor, applying fixes where possible, and update in this thread. Our current overall error rate of XX000 is 1 per 10,000 connections. Of course, we’d like to get it to 0. I see 7 XX000 in the last 3 days for you. Importantly, these errors should not result in any data loss – Postgres is ACID compliant. Are you losing data on your client because you’re not reconnecting and retrying?
Upon first connecting, a client may need to check the database to determine whether a transaction committed.
With any Postgres database – bit.io or otherwise – we strongly recommend using connection pools and the ability to retry queries, especially in production. The Connection Lifecycle explains how disconnects can happen. In this case, after you receive a disconnect and when you re-establish the connection, your client needs to determine what to retry.
We have implemented the retry at most 3 times every 30s but seems like that “database stopped code 08004” for 3.5 hours caused data loss as I checked in the database.
We do have connection pools. How many pools do you recommend ?
The key feature and reason we recommend pools is that the pooling mechanism autoreconnects. The pool size is up to you. I would recommend trying to reconnect within 10 seconds. Networks can be flakey but if you wait too long the database will cooldown.