Home
> Programming, Windows > Audio FingerPrinting and Matching Using Acoustid Chromaprint on Windows With Python
Audio FingerPrinting and Matching Using Acoustid Chromaprint on Windows With Python
I needed to generate audio fingerprints for matching/pattern recognition. I should say I tried a couple of approaches like using MFCC and spectrogram matching using Computer Vision techniques but I was not successfully with it. Using acoustid Chromaprint gave positive results. Since I was using windows, I had to compile the latest version using Visual Studio.
Since I my target was to get wav format to work, I didnt need ffmpeg dependency, so I removed it. You will need the avcodec and ffwt libs. The algorithm used for Acoustid for finger print matching is buried in a PostgreSQL user-defined C function which I easily translated to python for use.
In the chromaprint python module, I just need to add only one function that compares two fingerprints and returns the distance between them. In the original function, a score is returned, the only change I return 1-score, which I call the distance measure.
Following is the distance measure function.
Since I my target was to get wav format to work, I didnt need ffmpeg dependency, so I removed it. You will need the avcodec and ffwt libs. The algorithm used for Acoustid for finger print matching is buried in a PostgreSQL user-defined C function which I easily translated to python for use.
In the chromaprint python module, I just need to add only one function that compares two fingerprints and returns the distance between them. In the original function, a score is returned, the only change I return 1-score, which I call the distance measure.
Following is the distance measure function.
def calculate_distance(fingerprint1, fingerprint2): ''' http://oxygene.sk/lukas/2011/01/how-does-chromaprint-work/ This algorithm is extracted from the postgres chromaprint c function Instead of returning the score I return 1-score as the distance measure ''' if len(numpy.shape(fingerprint1)) != 1 != len(numpy.shape(fingerprint2)): raise Exception('Only one dimension arrays allowed') if fingerprint1 == None or fingerprint2 == None: return 0 numcounts = len(fingerprint1) + len(fingerprint2) + 1; counts = numpy.zeros(numcounts); for i in range(len(fingerprint1)): jbegin = max(0, i - ACOUSTID_MAX_ALIGN_OFFSET); jend = min(len(fingerprint2), i + ACOUSTID_MAX_ALIGN_OFFSET); for j in range(jbegin,jend): biterror = popcount_lookup8(fingerprint1[i] ^ fingerprint2[j]); # ereport(DEBUG5, (errmsg("comparing %d and %d with error %d", i, j, biterror))); if (biterror <= ACOUSTID_MAX_BIT_ERROR): offset = i - j + len(fingerprint2) counts[offset] += 1 return 1.0 - (numpy.max(counts) /(1.0* min(len(fingerprint1), len(fingerprint2))))
Depending on the what you target of using the chromaprint matching algorithm, you can adjust the ACOUSTID_MAX_BIT_ERROR, and ACOUSTID_MAX_ALIGN_OFFSET to get lower or high distance values that you can use as the basis later on for classification using K-Nearest Neighbour. You can download the windows chromaprint version 0.4 here. The download I compiled, fingerprints only wave data since I removed ffmpeg dependencies. The download also has the dependency dlls.
Categories: Programming, Windows
audio matching, chromaprint, Python
What function is popcount_lookup8?
popcount_table_8bit = [\
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,\
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,\
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,\
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,\
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,\
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,\
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,\
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8,\
];
def popcount_lookup8(x):
global popcount_table_8bit
return popcount_table_8bit[x & 255] + \
popcount_table_8bit[(x >> 8) & 255] + \
popcount_table_8bit[(x >> 16) & 255] + \
popcount_table_8bit[(x >> 24)]
Which is basically the total number of 1’s in the binary representation of x, right?