PostgreSQL generate_series() function examples for random data generation

Generating massive amounts of data can be useful to test queries, indexes and tuning complex SQL queries on more realistic volumes, to get useful approximations of production like environment.

This article demonstrates some generate_series() usages allowing you to feed any type of table. Let us deep dive to some uncomplicated yet interesting features with generate_series() in PostgreSQL.

Use generate_series() to generate sequence of data

pg=# select id tid, 'Name-'||id tname, now() tdate
  from generate_series(101, 110) id;
 tid |  tname   |              tdate
-----+----------+---------------------------------
 101 | Name-101 | 2020-07-10 10:44:47.73706+05:30
 102 | Name-102 | 2020-07-10 10:44:47.73706+05:30
 103 | Name-103 | 2020-07-10 10:44:47.73706+05:30
 104 | Name-104 | 2020-07-10 10:44:47.73706+05:30
 105 | Name-105 | 2020-07-10 10:44:47.73706+05:30
 106 | Name-106 | 2020-07-10 10:44:47.73706+05:30
 107 | Name-107 | 2020-07-10 10:44:47.73706+05:30
 108 | Name-108 | 2020-07-10 10:44:47.73706+05:30
 109 | Name-109 | 2020-07-10 10:44:47.73706+05:30
 110 | Name-110 | 2020-07-10 10:44:47.73706+05:30
(10 rows)

You can also use generate_series() function in the SELECT list.

pg=# select 
     generate_series(101, 110) tid, 
     'Name-'||generate_series(101, 110) tname, now() tdate;
 tid |  tname   |              tdate               
-----+----------+----------------------------------
 101 | Name-101 | 2020-07-11 12:24:45.184393+05:30
 102 | Name-102 | 2020-07-11 12:24:45.184393+05:30
 103 | Name-103 | 2020-07-11 12:24:45.184393+05:30
 104 | Name-104 | 2020-07-11 12:24:45.184393+05:30
 105 | Name-105 | 2020-07-11 12:24:45.184393+05:30
 106 | Name-106 | 2020-07-11 12:24:45.184393+05:30
 107 | Name-107 | 2020-07-11 12:24:45.184393+05:30
 108 | Name-108 | 2020-07-11 12:24:45.184393+05:30
 109 | Name-109 | 2020-07-11 12:24:45.184393+05:30
 110 | Name-110 | 2020-07-11 12:24:45.184393+05:30
(10 rows)

Generate random numbers using generate_series()

PostgreSQL provides the random() function that returns an arbitrary number between 0 (inclusive) and 1 (exclusive). The random() function takes no parameters or arguments for the random function. The following statement returns a random number between 0 and 1.

pg=# select random();
      random
-------------------
 0.893609274178743
(1 row)

You can combine random() with generate_series() to generate any number of random numbers greater than 1 or as you wish using some mathematical operations with random().

The following statement returns a series of random numbers greater 1.

pg=# select r*random() as random_vals from generate_series(11, 15) as r;
   random_vals
------------------
 1.8320738977579794
  4.157597562397726
  5.863071411978261
 11.352248463767118
 12.641513386736136
(5 rows)

You can leverage other functions such as row_number(), with generate_series() to generate sequence numbers for every row.

pg=# select 
     row_number() over() as id, 
     r*random() as random_vals 
  from generate_series(11, 15) as r;
 id |    random_vals     
----+--------------------
  1 |  5.148687264187007
  2 | 1.2621610815891415
  3 |  8.717584832600327
  4 |  4.645600803570602
  5 | 11.540057391379523
(5 rows)

You can add date columns with custom logic with generate_series().

pg=# select 
     r as tid, 
     10*r*random() as tvals, 
     current_date-r-1 as tdate, 
     'test-'||r tname 
  from generate_series(1, 5) as r;
 tid |       tvals        |   tdate    | tname  
-----+--------------------+------------+--------
   1 |  6.494571684777171 | 2020-07-09 | test-1
   2 |  7.450337328232095 | 2020-07-08 | test-2
   3 | 0.9161521129685468 | 2020-07-07 | test-3
   4 | 29.144583504084807 | 2020-07-06 | test-4
   5 |  45.60197216716766 | 2020-07-05 | test-5
(5 rows)

Using generate_series() in FROM and SELECT clause at the same time

generate_series() in PostgreSQL is a very powerful function and technically using it can help reduce many lines of code. Using generate_series() in FROM and SELECT clause at the same time eliminates writing pl/pgsql function in many situations.

pg=# select generate_series(1, 3) rpt, gs as tid 
  from generate_series(10, 12) gs;
 rpt | tid 
-----+-----
   1 |  10
   2 |  10
   3 |  10
   1 |  11
   2 |  11
   3 |  11
   1 |  12
   2 |  12
   3 |  12
(9 rows)

Q : Write a query that generates number 21 to 25 and each value repeats for values 0 and 1. Your output should look like below-

iflag | id
——-+—-
0 | 21
1 | 21
0 | 22
1 | 22

and so on..

Ans:

pg=# select generate_series(0, 1) iflag, id 
  from generate_series(21, 25)id;
 iflag | id 
-------+----
     0 | 21
     1 | 21
     0 | 22
     1 | 22
     0 | 23
     1 | 23
     0 | 24
     1 | 24
     0 | 25
     1 | 25
(10 rows)

Q : Write a query that generates previous 3 dates in ‘YYYYMMDD’ format and each date must repeat for number format (001, 002, 003, 004). Your output should look like below-

iflag | id
——-+—-
0 | 21
1 | 21
0 | 22
1 | 22

and so on..

Ans:

pg=# select 
     to_char(generate_series(1, 4), '000') s1, 
     to_char(dates, 'yyyymmdd') dates 
  from generate_series(current_date-3, current_date-1, '1 day'::interval)dates; 
  s1  |  dates   
------+----------
  001 | 20200708
  002 | 20200708
  003 | 20200708
  004 | 20200708
  001 | 20200709
  002 | 20200709
  003 | 20200709
  004 | 20200709
  001 | 20200710
  002 | 20200710
  003 | 20200710
  004 | 20200710
(12 rows)

You can off course have lot many other combinations with generate_series(). In this short article we tried to demonstrate you some use cases which can be used to generate sample data for testing and other purposes.

Please feel free to comment your use case and which variant of generate_series() saved your day.

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments