Extracting Each Unique Topic from an Array,
2/25/2025 12:00:00 AM
joe-jngigiI really needed to extract unique data from an array, and I can say, skill is very important. Let us see how we can go about that
I needed to extract the count of each of the topics, and also return the topics. In Javascript
, the simplest and most effient way to extract the unique values from the array is to use Set
. The set is a built-in object that lets one store unique values of any type.
This was the data I needed to sort out.
const data = [
'Continous Development: TypeScipt',
'Continous Development: TypeScipt',
'Linux & Servers',
'Continous Development: TypeScipt',
'Continous Development: TypeScipt',
'Continous Development: TypeScipt',
'Continous Development: Frontend',
'Continous Development: TypeScipt'
];
In order to remove the duplicates, there is need to convert the data into a set, where we pass the array into a set. Internally, this creates a collection with only unique values.
If we need to convert the set into an array, we will just use the spread operator
const frontmatter_data = await this.ArticlesFrontMatter(docType);
const data = frontmatter_data.map((topic) => {
return topic.topic;
});
new Set(data)
const uniqueTopics = [...new Set(data)];
//-------------or We can use
const uniqueTopics = [
...new Set(frontmatter_data.map((topic) => topic.topic)),
];
The Set handles duplicate removal automatically—no need to write extra logic. Adding the elements to a set and checking for uniqueness is very fast (O(1) time complexity), making this approach more efficient for large arrays
Alternatives
1. Using filter
with indexOf
:
const uniqueTopics = data.filter((topic, index) => data.indexOf(topic) === index);
The code generates a new array of data, that contains only the first occurrence of each element from the original data. The .filter()
method is used to iterate over each element from the original data array. We then have the call back function; that recieves two parameters
- topic : The current element being processed.
- index : the index of the element being processed.
data.indexOf(topic)
returns the index of its first appearance in the data. This is how the data is shaped
const data = [
'Continous Development: TypeScipt', // index 0
'Linux & Servers', // index 1
'Continous Development: Frontend' // index 2
];
Why do we need to compare like this data.indexOf(topic) === index)
in the function?
If we just return data.indexOf(topic)
, the following happens:
-
For index 0, topic: 'Continous Development: TypeScipt'.
data.indexOf(topic)
returns 0. The method then does a truthy check. In javascript, 0 is falsy. In the result, this element is not included; because it is falsy -
For 'Linux & Servers', it returns 1, the truthy check result is truthy, hence in the returned result, the element is included.
-
Same case for Continous Development: Frontend, the element returned is 2, and we have the truthy check, which returns truthy, hence the element is included
Hence if we just returned just data.indexOf(topic)
, we get data in form of
[ 'Linux & Servers', 'Continous Development: Frontend' ]
For justification,
The
indexOf()
returns the first occurrence of a given topic, in the array. When it is used within a.filter()
, for each element(topic) in the array;data.indexOf(topic)
returns the first index where that topic appears. The callback then compares that value to the current index of the iteration. If they match, it means this is the first time the topic is encountered, it is "displayed". If they don't match, it means that the topic has appeared before, and it is skipped.
So, data.indexOf(topic)
is used to determine the position of the topic in the array, ensuring only the first occurrence of each topic is kept in the filtered result.
The main reason you would not use this in our case is because rt’s slower, (O(n²) time complexity), because indexOf scans the array for each element.
2. Use of reduce
const uniqueTopics = data.reduce((acc, topic) => {
if (!acc.includes(topic)) {
acc.push(topic);
}
return acc;
}, []);
This reduce
builds a new array, by adding a topic only if the topic has not been seen before. The only downside to this is it is more verbose and still slower than set
due to the includes
checking.