background

Recently discovered that the redis memory used in the project is very expensive (but I thought that there is not so much memory consumption for the key?), use the info command to see some memory size and key occupied by all keys. The number and so on, as shown below (only screenshots of the memory and keyspace sections):

It can be found that the info command can only see the total memory size and the number of keys. These are not enough for analyzing which ones or which types of keys occupy large memory!

Tools Research

If you want to do something good, you must first sharpen your tools!

After various google searches, I found a tool that seems to be possible: redis-rdb-tools.

Then split up,

  1. Let the operation and maintenance copy the online rdb snapshot file to a test machine with scp (after all, the operation on the online machine is not safe)
  2. I need to install the rdb tool in the fastest and cleanest way. I choose to install it directly in python docker.

Analysis Road

According to the tool [document description] (https://github.com/sripathikrishnan/redis-rdb-tools#generate-memory-report), you can convert rdb snapshot files to csv format files:

There are two ways to get a csv file.

  1. Directly use the python pandas library to read the csv file, you can do some statistics, filtering and other operations (almost with the api operation method equivalent to sql, the specific operation can refer to this article: [[analysis] : Pandas performs data analysis like sql operation python] (https://www.jianshu.com/p/7d1d91ac7df3).
  2. Import csv into a relational database and use sql to operate it, which is more flexible. About the database selection: After testing the two relational databases of mysql and postgres, the feeling is quite deep. After importing more than 300 million pieces of data in the above csv in the mysql single table, the query is directly! Postgres is still strong after importing data (average one like query for about a dozen seconds, it is still acceptable!).

just try!

rdb file converted to csv

(Because it is the internal business data of the operation, some data details are inconvenient to disclose, only relevant important commands and some empirical methods after the pitting are posted)

# 1. Run a python docker container first (note that the rdb file is mounted)
Docker run -it -v YOUR_PATH/xxx.rdb:/data/xxx.rdb python bash

# 2. Install rdb tools
Pip install rdbtools python-lzf

# 3. Execute rdb to csv command (this process depends on the size of the rdb file)
Rdb -c memory /data/xxx.rdb -f memory.csv

Some paths and names in the above commands are replaced with your own real values.

csv Simple cleaning

It is also a pit here. When importing csv data into the postgres database, it reported an error that means "the actual number of columns does not match the expected number of columns". It is possible that the value of some lines in the rdb tools is a bit problematic when converting, or other bugs. Here, in view of the fact that there are not many abnormal data, don’t go too far, just use pandas to clean it.

The relevant python code is as follows:

Import pandas as pd
Import numpy as np

Reader = pd.read_csv('/xxxx/memory.csv', iterator=True,error_bad_lines=False)

Loop = True
chunkSize =10000000
Chunks=[]

Total_bytes=0

While loop:
    Try:
        Chunk = reader.get_chunk(chunkSize)
        Chunks.append(chunk)
    Except StopIteration:
        Loop = False
        Print("Iteration is stopped.")

Df = pd.concat(chunks, ignore_index=True)

Df.to_csv('/xxx/memory2.csv', sep=',', encoding='utf-8')

Probably explain, here read the csv file, specify the option error_bad_lines=False, then pandas will automatically ignore the problematic line. Then read all the content in chunks, merge and then write to the new file.

csv import postgres

This step is actually theoretically unnecessary. As mentioned above, you can actually use the pandas operation csv to almost complete the analysis effect similar to sql. But the comparison is still relatively convenient to use sql directly, or to lead to the benefits of the database.

# 1. Run the postgres docker container (remember to mount the csv file converted in the above step)
Docker run --name postgres -v /xxx/memory2.csv:/memory.csv -d postgres:9.6

# 2. Enter the postgres container inside psql shell
Docker exec -it postgres psql -U postgres

# 3. Create a temporary table (It is recommended that all fields use text, otherwise the import may encounter a transformation error, the first field index is brought in by pandas, and can be processed when exporting csv)
Postgres=# create table keys_dump(
Index integer,
Database text,
Type text,
Key text,
Size_in_bytes text,
Encoding text,
Num_elements text,
Len_largest_element text,
Expiry text
);

# 4. Execute the import csv file command
Postgres=# COPY keys_dump FROM '/memory.csv' WITH csv;

sql analysis

Now the problem will be simpler. Here, because the key involves some actual business value, the following is just a simple enumeration of the total memory size occupied by the key of the string type:

Select sum(size_in_bytes::int) from keys_dump where type='text';

Such as sql, according to your actual scenario, such as pressing a keyword in the key to perform like query:

Select sum(size_in_bytes::int) from keys_dump where type='text' and key like 'xxxx%';

Or come to the top 10 of the statistics single key size:

Select * from keys_dump order by size_in_bytes::int desc limit 10;

The above sql statement, I have tried it myself, in the single table more than 300 million lines of execution, the total time is about 10 to 20 seconds or so, the overall speed is still acceptable, after all, just do offline analysis.

to sum up

The above only provides a simple idea of ​​redis memory analysis, which is roughly: rdb–> csv –> sql. If you have a better way, feel free to leave a comment below the article!

Last modified: 2019年3月29日

Author

Comments

Write a Reply or Comment

Your email address will not be published.