Обсуждение: Postgres DB design Question (High Level)
Introduction: I have a question about the design of a project as it relates to databases, and the scale-ability of the design. The project is volunteer, so there is no commercial interest. But first a bit of background: Background: I have programmed a rasp pi to record radio beeps from wildlife trackers, where the beep rate per minute (bpm) can be either 80, 40, or 30. The rate can only change once every 24 hours. The beeps are transmitted on up to 100 channels and the animals go in an out of range on a given day. This data is written to a Sqlite3 db on the Rpi. Since the beep rate will not change in a given 24 hour period, and since the rasp pi runs on a solar/battery setup it wakes up for 2 hours every day to record the radio signals and shuts down, so for a given 24 hour period I only get 2 hours of data (anywhere between about 5-15,000 beeps depending on beep rate and assuming the animal stays within range). The rpi Sqlite3 DB is sync'd over cellular to a postgresql database on my server at the end of each days 2 hour recording period. Since I am processing radio signals there is always the chance of random interference being decoded as a valid beep. To avoid a small amount of interference being detected as a valid signal, I check for quantity of valid beeps within a given 1 hour window - so for example if the beep rate is 80 it checks that there are 50% of the maximum beep rate detected (ie 80*60*0.5) - if there is only a handful of beeps it is discarded. Database design: The BPM table is very simple: Id Bpm_rate Integer dt DateTime I want to create a web based dashboard for all the currently detected signals, where the dashboard contains a graph of the daily beep rate for each channel (max 100 channels) over user selectable periods from 1 week to 1 year - that query does not scale well if I query the bpm table. To avoid this I have created a bpm summary table which is generated periodically (hourly) off the bpm table. The bpm summary table contains the dominant beep rate for a given hour (so 2 records per day per channel assuming a signal is detected). Does this summary table approach make sense? I have noted that I am periodically syncing from MySQL to the server, and then periodically updating the summary table - its multi stage syncing and I wonder if that makes this approach fragile (although I don't see any alternative). Thanks in advance for any input. Al
On Thu, 2025-06-19 at 16:50 +1200, Al Grant wrote: > I want to create a web based dashboard for all the currently detected > signals, where the dashboard contains a graph of the daily beep rate > for each channel (max 100 channels) over user selectable periods from > 1 week to 1 year - that query does not scale well if I query the bpm > table. > > To avoid this I have created a bpm summary table which is generated > periodically (hourly) off the bpm table. The bpm summary table > contains the dominant beep rate for a given hour (so 2 records per day > per channel assuming a signal is detected). > > Does this summary table approach make sense? Yes. Pre-aggregation to reduce the data volume is common in a data warehouse. Ideally, you don't have to scan the entire base table to create the summary. You can consider a trigger for that, but more elegant might be a solution based on partitioning, where you aggregate a partition once it is complete. > I have noted that I am periodically syncing from MySQL to the server, > and then periodically updating the summary table - its multi stage > syncing and I wonder if that makes this approach fragile (although I > don't see any alternative). "MySQL" and "fragile" in the same sentence. Sounds about right. Yours, Laurenz Albe
Not sure if it makes a big difference to the answer, but MySQL was a typo - the edge db on the Rpi is Sqlite3. On Thu, Jun 19, 2025 at 5:00 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2025-06-19 at 16:50 +1200, Al Grant wrote: > > I want to create a web based dashboard for all the currently detected > > signals, where the dashboard contains a graph of the daily beep rate > > for each channel (max 100 channels) over user selectable periods from > > 1 week to 1 year - that query does not scale well if I query the bpm > > table. > > > > To avoid this I have created a bpm summary table which is generated > > periodically (hourly) off the bpm table. The bpm summary table > > contains the dominant beep rate for a given hour (so 2 records per day > > per channel assuming a signal is detected). > > > > Does this summary table approach make sense? > > Yes. Pre-aggregation to reduce the data volume is common in a > data warehouse. Ideally, you don't have to scan the entire base > table to create the summary. You can consider a trigger for that, > but more elegant might be a solution based on partitioning, where > you aggregate a partition once it is complete. > > > I have noted that I am periodically syncing from MySQL to the server, > > and then periodically updating the summary table - its multi stage > > syncing and I wonder if that makes this approach fragile (although I > > don't see any alternative). > > "MySQL" and "fragile" in the same sentence. Sounds about right. > > Yours, > Laurenz Albe -- "Beat it punk!" - Clint Eastwood