Unexpected Error from the database error code “XX000”

Another things, I keeps getting the following errors which causing data loss while updating or inserting from my pro plan. Tell me if you need other info. It’s urgent
Can you help with the following stacks ?

  1. Unexpected Error from the database error code “XX000”

  2. the database system is shutting down error code “57P03”, routine “ProcessStartupPacket”

  3. terminating connection due to administrator command error code “57P01”, routine “ProcessInterrupts”

I understand you are encountering some errors. Thank you for providing those details. Can you share the name of the affected database (you can send me a private message if you’d prefer not to share in this thread) and any details about when these issues started and how frequently they occur? Also, how are you connecting to bit.io (which client/language)? Are you are running queries on a strict schedule (e.g. every X minutes)?

After some further investigation, we have identified an issue we believe was responsible for the recent connection issues you have been observing. We implemented a fix this morning that we hope will ensure your connections now work as expected. That said, we are continuing to monitor the situation. If you encounter the issue again, please let us know and we will investigate further.

I think I still get the Unexpected Error above

Thanks for letting us know. We will continue investigating. I have reached out to you directly to get the database name. In the meantime, can you provide any details about when these issues started and how frequently they occur? Also, how are you connecting to bit.io (which client/language)? Are you are running queries on a strict schedule (e.g. every X minutes)?

  1. It always happens since I use it in production. It was ok in testing env because it has less operations.
  2. I’m connecting using massive in nodejs which uses pg-promise: 10.10.2
  3. I’m not running with any schedule. I just do usual read write operations.

When did you start using it in production?

Beginning of March, you can check log i sent

Have you found a solution ?

We’ve deployed several performance tweaks to reduce the disconnect errors and have noted a marked decrease in the type of error you’re receiving starting early this afternoon. Unfortunately, it’s not just a simple bug fix – there is network latency, retries, and locking all in the mix. So, we’re going to continue monitoring and tweaking over the next week. Are you still receiving errors?

About 20 minutes ago, I got a “the database system is shutting down”. No other than this for today

I got another “the database system is shutting down” 10 minutes ago

Those are not XX000. We’ve seen a drastic drop in XX000 over the last 24 hours after deploying some changes. I have no seen any on your database. Please let us know if you still see those.

Database shutting down messages are related to The Connection Lifecycle. You’re likely connecting at the the exact moment of a cooldown. If you have questions about that behavior, please start another thread.

  1. I still see the “Unexpected Error” but less than before. So still has data loss
  2. I still see “terminating connection due to administrator command” as less as before
  3. On 11th, There’re a lot of new error “database stopped code 08004” for 3.5 hours
  4. Is that cool down a must ? If so, What is the solution to prevent data loss ?

We will continue to monitor, applying fixes where possible, and update in this thread. Our current overall error rate of XX000 is 1 per 10,000 connections. Of course, we’d like to get it to 0. I see 7 XX000 in the last 3 days for you. Importantly, these errors should not result in any data loss – Postgres is ACID compliant. Are you losing data on your client because you’re not reconnecting and retrying?

Upon first connecting, a client may need to check the database to determine whether a transaction committed.

With any Postgres database – bit.io or otherwise – we strongly recommend using connection pools and the ability to retry queries, especially in production. The Connection Lifecycle explains how disconnects can happen. In this case, after you receive a disconnect and when you re-establish the connection, your client needs to determine what to retry.

  1. We have implemented the retry at most 3 times every 30s but seems like that “database stopped code 08004” for 3.5 hours caused data loss as I checked in the database.
  2. We do have connection pools. How many pools do you recommend ?

A post was split to a new topic: Database stopped code 08004

The key feature and reason we recommend pools is that the pooling mechanism autoreconnects. The pool size is up to you. I would recommend trying to reconnect within 10 seconds. Networks can be flakey but if you wait too long the database will cooldown.

A post was split to a new topic: Strategies to handle bit.io cooldown

A post was split to a new topic: Terminating connection due to administrator command