fix(mirror): reuse existing same-source mirrors instead of creating suffixed duplicates (#315) (#317)

Starred (and other) repos duplicated on every re-mirror (starred/Repo,
Repo-owner, Repo-owner-1, ...) because the existence check only asked
"does a repo with this name exist?" and never "is the existing repo a
mirror of THIS same source?". The repo's own prior mirror counted as a
collision, so generateUniqueRepoName bumped to the next suffix each run,
repointing mirroredLocation at the newest copy. Under a single re-call,
3 concurrent/retried jobs each computed a DIFFERENT suffixed name, so the
location-based in-flight guard never matched and the race produced extra
copies.

Fix (source-identity aware):
- New shared helper src/lib/utils/mirror-source-match.ts:
  - normalizeCloneUrl / cloneUrlsMatch: credential-, .git-, slash- and
    host-case-insensitive clone URL comparison.
  - isMirrorOfSource: a Gitea repo is "ours" only if it is a mirror AND
    its original_url matches this repo's source.
  - findExistingMirror: resolves an existing same-source mirror via the
    recorded mirroredLocation first (survives strategy changes — #309),
    then the base candidate name.
  - classifyCandidateName: pure available/reusable/taken decision.
- gitea-enhanced: export GiteaRepoInfo and add original_url (Gitea's
  recorded migration source) for source matching.
- Both create paths (mirrorGithubRepoToGitea, mirrorGitHubRepoToGiteaOrg):
  run findExistingMirror BEFORE name generation; on a hit, reuse that
  location and route into the existing "already mirrored" handling rather
  than calling generateUniqueRepoName. Names now converge under
  concurrency so the in-flight guard becomes effective.
- generateUniqueRepoName is now source-aware: an occupied name held by a
  mirror of the SAME source is reused (no suffix); suffixing only happens
  on a genuine different-source collision, preserving #95/#236 behavior.
  The per-user DB claim check is retained so two users mirroring the same
  source into a shared org stay separated.
- Phantom-fork guard (#309): the existingRepoInfo.mirror branches now
  verify same-source before marking "mirrored"; on mismatch they fall
  through to unique-name generation and create a separate mirror.
- Scheduler: a `failed` repo whose mirroredLocation still resolves to a
  live same-source mirror is routed to syncGiteaRepo instead of re-create,
  breaking the failed-metadata re-create loop cheaply.
- Remove dead src/lib/starred-repos-handler.ts (zero importers across all
  git history); its correct base-name/.mirror reuse logic now lives in the
  shared helper.

Tests: src/lib/utils/mirror-source-match.test.ts (30 cases) covers URL
normalization, reuse at base name, reuse via mirroredLocation across a
strategy change, genuine different-source collision (suffix), phantom
fork, stale mirroredLocation fallback, per-user DB-claim separation, and
the suffix-vs-reuse classification. Full suite: 319 pass, 0 fail.
This commit is contained in:
ARUNAVO RAY
2026-06-13 08:00:41 +05:30
committed by GitHub
parent 699a5771f5
commit 40ee3cbc44
6 changed files with 899 additions and 398 deletions
+6 -1
View File
@@ -43,13 +43,18 @@ type SyncDependencies = {
/**
* Enhanced repository information including mirror status
*/
interface GiteaRepoInfo {
export interface GiteaRepoInfo {
id: number;
name: string;
owner: { login: string } | string;
mirror: boolean;
mirror_interval?: string;
clone_url?: string;
// Original migration source URL. Gitea/Forgejo populate this with the
// upstream clone address for migrated/mirrored repos, so it is the
// authoritative way to tell whether an existing mirror points at THIS
// GitHub source (vs. a same-named mirror of a different source).
original_url?: string;
private: boolean;
}
+239 -94
View File
@@ -579,7 +579,27 @@ export const mirrorGithubRepoToGitea = async ({
// Determine the actual repository name to use (handle duplicates for starred repos)
let targetRepoName = repository.name;
if (
// REUSE-FIRST (issues #315 / #309): before generating any (suffixed) name,
// check whether this exact source is already mirrored — either at the
// recorded mirroredLocation or at the base name. If so, reuse that location
// and route into the "already mirrored" handling below instead of creating
// a duplicate. This must run before generateUniqueRepoName so the names
// converge under concurrency (the in-flight guard then becomes effective).
const { findExistingMirror } = await import("./utils/mirror-source-match");
const existingMirror = await findExistingMirror({
repository,
config,
candidateOwner: repoOwner,
candidateName: repository.name,
});
if (existingMirror) {
repoOwner = existingMirror.owner;
targetRepoName = existingMirror.repoName;
console.log(
`Reusing existing same-source mirror for ${repository.fullName} at ${repoOwner}/${targetRepoName}`
);
} else if (
repository.isStarred &&
config.githubConfig &&
(config.githubConfig.starredReposMode || "dedicated-org") === "dedicated-org"
@@ -594,6 +614,7 @@ export const mirrorGithubRepoToGitea = async ({
githubOwner,
fullName: repository.fullName,
strategy: config.githubConfig.starredDuplicateStrategy,
sourceCloneUrl: repository.cloneUrl,
});
if (targetRepoName !== repository.name) {
@@ -643,45 +664,72 @@ export const mirrorGithubRepoToGitea = async ({
strategy: "delete", // Can be configured: "skip", "delete", or "rename"
});
} else if (existingRepoInfo?.mirror) {
console.log(
`Repository ${targetRepoName} already exists in Gitea under ${repoOwner}. Updating database status.`
);
// PHANTOM-FORK GUARD (#309): a mirror at this name is only "ours" if it
// mirrors THIS source. existingMirror short-circuits the check
// because findExistingMirror already confirmed the source match.
const { isMirrorOfSource } = await import("./utils/mirror-source-match");
const sameSource =
!!existingMirror ||
isMirrorOfSource(existingRepoInfo, repository.cloneUrl);
await syncRepositoryMetadataToGitea({
config,
octokit,
repository,
giteaOwner: repoOwner,
giteaRepoName: targetRepoName,
giteaToken: decryptedConfig.giteaConfig.token,
});
if (!sameSource) {
// A different source occupies this name. Treat as a genuine collision:
// generate a unique name and fall through to create a separate mirror.
console.warn(
`[Mirror] ${repoOwner}/${targetRepoName} is a mirror of a different source. ` +
`Generating a unique name for ${repository.fullName} to avoid overwriting it.`
);
targetRepoName = await generateUniqueRepoName({
config,
orgName: repoOwner,
baseName: repository.name,
githubOwner: repository.fullName.split("/")[0],
fullName: repository.fullName,
strategy: config.githubConfig?.starredDuplicateStrategy,
sourceCloneUrl: repository.cloneUrl,
});
// expectedLocation is recomputed below before the "mirroring" write.
} else {
console.log(
`Repository ${targetRepoName} already exists in Gitea under ${repoOwner}. Updating database status.`
);
// Update database to reflect that the repository is already mirrored
await db
.update(repositories)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${repoOwner}/${targetRepoName}`,
})
.where(eq(repositories.id, repository.id!));
await syncRepositoryMetadataToGitea({
config,
octokit,
repository,
giteaOwner: repoOwner,
giteaRepoName: targetRepoName,
giteaToken: decryptedConfig.giteaConfig.token,
});
// Append log for "mirrored" status
await createMirrorJob({
userId: config.userId,
repositoryId: repository.id,
repositoryName: repository.name,
message: `Repository ${repository.name} already exists in Gitea`,
details: `Repository ${repository.name} was found to already exist in Gitea under ${repoOwner} and database status was updated.`,
status: "mirrored",
});
// Update database to reflect that the repository is already mirrored
await db
.update(repositories)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${repoOwner}/${targetRepoName}`,
})
.where(eq(repositories.id, repository.id!));
console.log(
`Repository ${repository.name} database status updated to mirrored`
);
return;
// Append log for "mirrored" status
await createMirrorJob({
userId: config.userId,
repositoryId: repository.id,
repositoryName: repository.name,
message: `Repository ${repository.name} already exists in Gitea`,
details: `Repository ${repository.name} was found to already exist in Gitea under ${repoOwner} and database status was updated.`,
status: "mirrored",
});
console.log(
`Repository ${repository.name} database status updated to mirrored`
);
return;
}
} else {
console.warn(
`[Mirror] Repository ${repoOwner}/${targetRepoName} exists but mirror status could not be verified. Continuing with mirror creation flow.`
@@ -689,6 +737,10 @@ export const mirrorGithubRepoToGitea = async ({
}
}
// Recompute the target location in case a phantom-fork collision above
// forced a renamed target after the initial expectedLocation was derived.
const targetLocation = `${repoOwner}/${targetRepoName}`;
console.log(`Mirroring repository ${repository.name}`);
// DOUBLE-CHECK: Final idempotency check right before updating status
@@ -696,7 +748,7 @@ export const mirrorGithubRepoToGitea = async ({
const finalCheck = await isRepoCurrentlyMirroring({
config,
repoName: targetRepoName,
expectedLocation,
expectedLocation: targetLocation,
});
if (finalCheck) {
@@ -714,7 +766,7 @@ export const mirrorGithubRepoToGitea = async ({
.update(repositories)
.set({
status: repoStatusEnum.parse("mirroring"),
mirroredLocation: expectedLocation,
mirroredLocation: targetLocation,
updatedAt: new Date(),
})
.where(eq(repositories.id, repository.id!));
@@ -1177,6 +1229,14 @@ async function isMirroredLocationClaimedInDb({
* Checks both the Gitea instance (HTTP) and the local DB (mirroredLocation)
* to reduce collisions during concurrent batch mirroring.
*
* Source-aware (issues #315 / #309): when a candidate name is already occupied
* by a mirror of THIS SAME GitHub source, the name is REUSED rather than
* suffixed — this is what previously caused starred repos to spawn `-owner`,
* `-owner-1`, … duplicates on every re-mirror. Suffixing only happens on a
* genuine different-source collision (preserving the #95/#236 cross-owner
* behavior). The per-user DB claim check is retained so two users mirroring the
* same source into a shared org stay separated.
*
* NOTE: This function only checks availability — it does NOT claim the name.
* The actual claim happens later when mirroredLocation is written at the
* status="mirroring" DB update, which is protected by a unique partial index
@@ -1189,6 +1249,7 @@ async function generateUniqueRepoName({
githubOwner,
fullName,
strategy,
sourceCloneUrl,
}: {
config: Partial<Config>;
orgName: string;
@@ -1196,6 +1257,10 @@ async function generateUniqueRepoName({
githubOwner: string;
fullName: string;
strategy?: string;
// Source GitHub clone URL, used to decide whether an occupied name belongs to
// THIS repo's mirror (reuse) or a different source (suffix). When omitted,
// behavior degrades to the legacy "any occupant collides" semantics.
sourceCloneUrl?: string;
}): Promise<string> {
if (!fullName?.includes("/")) {
throw new Error(
@@ -1206,33 +1271,55 @@ async function generateUniqueRepoName({
const duplicateStrategy = strategy || "suffix";
const userId = config.userId || "";
// Helper: check both Gitea and local DB for a candidate name
const isNameTaken = async (candidateName: string): Promise<boolean> => {
const { getGiteaRepoInfo } = await import("./gitea-enhanced");
const { classifyCandidateName } = await import("./utils/mirror-source-match");
// Resolve the I/O for a candidate name (Gitea existence, DB claim, repo info)
// and defer the available/reusable/taken decision to the pure, unit-tested
// classifyCandidateName helper.
const classifyName = async (candidateName: string) => {
const existsInGitea = await isRepoPresentInGitea({
config,
owner: orgName,
repoName: candidateName,
});
if (existsInGitea) return true;
// Also check local DB to catch concurrent batch operations
// where another repo claimed this location but hasn't created it in Gitea yet
// A DB claim by a DIFFERENT repo (concurrent batch) always blocks reuse.
let claimedByOther = false;
if (userId) {
const claimedInDb = await isMirroredLocationClaimedInDb({
claimedByOther = await isMirroredLocationClaimedInDb({
userId,
candidateLocation: `${orgName}/${candidateName}`,
excludeFullName: fullName,
});
if (claimedInDb) return true;
}
return false;
// Only fetch repo info when it can actually change the decision (existing,
// same-source candidate that is not DB-claimed by another repo).
const repoInfo =
existsInGitea && sourceCloneUrl && !claimedByOther
? await getGiteaRepoInfo({
config,
owner: orgName,
repoName: candidateName,
})
: null;
return classifyCandidateName({
existsInGitea,
claimedByOther,
repoInfo,
sourceCloneUrl,
});
};
// First check if base name is available
const baseExists = await isNameTaken(baseName);
if (!baseExists) {
// First check the base name — reuse it if it already holds our own mirror.
const baseClass = await classifyName(baseName);
if (baseClass === "available") {
return baseName;
}
if (baseClass === "reusable") {
console.log(`Reusing existing same-source mirror name: ${orgName}/${baseName}`);
return baseName;
}
@@ -1262,9 +1349,14 @@ async function generateUniqueRepoName({
break;
}
const exists = await isNameTaken(candidateName);
const candidateClass = await classifyName(candidateName);
if (!exists) {
if (candidateClass === "reusable") {
console.log(`Reusing existing same-source mirror name: ${orgName}/${candidateName}`);
return candidateName;
}
if (candidateClass === "available") {
console.log(`Found unique name for duplicate starred repo: ${candidateName}`);
return candidateName;
}
@@ -1314,8 +1406,29 @@ export async function mirrorGitHubRepoToGiteaOrg({
// Determine the actual repository name to use (handle duplicates for starred repos)
let targetRepoName = repository.name;
// The org we will record/reuse for. Stays === orgName on the create path
// (migration uses orgName + giteaOrgId); a reuse hit may repoint it to the
// recorded mirroredLocation's owner for the early-return DB update.
let targetOwner = orgName;
if (
// REUSE-FIRST (issues #315 / #309): reuse an existing same-source mirror
// before generating any suffixed name. See mirrorGithubRepoToGitea for the
// rationale. Routes a hit into the "already mirrored" handling below.
const { findExistingMirror } = await import("./utils/mirror-source-match");
const existingMirror = await findExistingMirror({
repository,
config,
candidateOwner: orgName,
candidateName: repository.name,
});
if (existingMirror) {
targetOwner = existingMirror.owner;
targetRepoName = existingMirror.repoName;
console.log(
`Reusing existing same-source mirror for ${repository.fullName} at ${targetOwner}/${targetRepoName}`
);
} else if (
repository.isStarred &&
config.githubConfig &&
(config.githubConfig.starredReposMode || "dedicated-org") === "dedicated-org"
@@ -1330,6 +1443,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
githubOwner,
fullName: repository.fullName,
strategy: config.githubConfig.starredDuplicateStrategy,
sourceCloneUrl: repository.cloneUrl,
});
if (targetRepoName !== repository.name) {
@@ -1340,7 +1454,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
}
// IDEMPOTENCY CHECK: Check if this repo is already being mirrored
const expectedLocation = `${orgName}/${targetRepoName}`;
const expectedLocation = `${targetOwner}/${targetRepoName}`;
const isCurrentlyMirroring = await isRepoCurrentlyMirroring({
config,
repoName: targetRepoName,
@@ -1358,7 +1472,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
const isExisting = await isRepoPresentInGitea({
config,
owner: orgName,
owner: targetOwner,
repoName: targetRepoName,
});
@@ -1366,7 +1480,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
const { getGiteaRepoInfo, handleExistingNonMirrorRepo } = await import("./gitea-enhanced");
const existingRepoInfo = await getGiteaRepoInfo({
config,
owner: orgName,
owner: targetOwner,
repoName: targetRepoName,
});
@@ -1379,52 +1493,83 @@ export async function mirrorGitHubRepoToGiteaOrg({
strategy: "delete", // Can be configured: "skip", "delete", or "rename"
});
} else if (existingRepoInfo?.mirror) {
console.log(
`Repository ${targetRepoName} already exists in Gitea organization ${orgName}. Updating database status.`
);
// PHANTOM-FORK GUARD (#309): only treat this as "ours" if it mirrors
// THIS source. existingMirror short-circuits because findExistingMirror already
// confirmed the source match.
const { isMirrorOfSource } = await import("./utils/mirror-source-match");
const sameSource =
!!existingMirror ||
isMirrorOfSource(existingRepoInfo, repository.cloneUrl);
await syncRepositoryMetadataToGitea({
config,
octokit,
repository,
giteaOwner: orgName,
giteaRepoName: targetRepoName,
giteaToken: decryptedConfig.giteaConfig.token,
});
if (!sameSource) {
// Different source occupies this name: generate a unique name and
// fall through to create a separate mirror under orgName/giteaOrgId.
console.warn(
`[Mirror] ${targetOwner}/${targetRepoName} is a mirror of a different source. ` +
`Generating a unique name for ${repository.fullName} to avoid overwriting it.`
);
targetOwner = orgName;
targetRepoName = await generateUniqueRepoName({
config,
orgName,
baseName: repository.name,
githubOwner: repository.fullName.split("/")[0],
fullName: repository.fullName,
strategy: config.githubConfig?.starredDuplicateStrategy,
sourceCloneUrl: repository.cloneUrl,
});
} else {
console.log(
`Repository ${targetRepoName} already exists in Gitea organization ${targetOwner}. Updating database status.`
);
// Update database to reflect that the repository is already mirrored
await db
.update(repositories)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${orgName}/${targetRepoName}`,
})
.where(eq(repositories.id, repository.id!));
await syncRepositoryMetadataToGitea({
config,
octokit,
repository,
giteaOwner: targetOwner,
giteaRepoName: targetRepoName,
giteaToken: decryptedConfig.giteaConfig.token,
});
// Create a mirror job log entry
await createMirrorJob({
userId: config.userId,
repositoryId: repository.id,
repositoryName: repository.name,
message: `Repository ${targetRepoName} already exists in Gitea organization ${orgName}`,
details: `Repository ${targetRepoName} was found to already exist in Gitea organization ${orgName} and database status was updated.`,
status: "mirrored",
});
// Update database to reflect that the repository is already mirrored
await db
.update(repositories)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${targetOwner}/${targetRepoName}`,
})
.where(eq(repositories.id, repository.id!));
console.log(
`Repository ${targetRepoName} database status updated to mirrored in organization ${orgName}`
);
return;
// Create a mirror job log entry
await createMirrorJob({
userId: config.userId,
repositoryId: repository.id,
repositoryName: repository.name,
message: `Repository ${targetRepoName} already exists in Gitea organization ${targetOwner}`,
details: `Repository ${targetRepoName} was found to already exist in Gitea organization ${targetOwner} and database status was updated.`,
status: "mirrored",
});
console.log(
`Repository ${targetRepoName} database status updated to mirrored in organization ${targetOwner}`
);
return;
}
} else {
console.warn(
`[Mirror] Repository ${orgName}/${targetRepoName} exists but mirror status could not be verified. Continuing with mirror creation flow.`
`[Mirror] Repository ${targetOwner}/${targetRepoName} exists but mirror status could not be verified. Continuing with mirror creation flow.`
);
}
}
// Recompute the target location in case a phantom-fork collision above
// forced a renamed target after the initial expectedLocation was derived.
const targetLocation = `${orgName}/${targetRepoName}`;
console.log(
`Mirroring repository ${repository.fullName} to organization ${orgName} as ${targetRepoName}`
);
@@ -1437,7 +1582,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
const finalCheck = await isRepoCurrentlyMirroring({
config,
repoName: targetRepoName,
expectedLocation,
expectedLocation: targetLocation,
});
if (finalCheck) {
@@ -1455,7 +1600,7 @@ export async function mirrorGitHubRepoToGiteaOrg({
.update(repositories)
.set({
status: repoStatusEnum.parse("mirroring"),
mirroredLocation: expectedLocation,
mirroredLocation: targetLocation,
updatedAt: new Date(),
})
.where(eq(repositories.id, repository.id!));
+21
View File
@@ -266,6 +266,27 @@ async function runScheduledSync(config: any): Promise<void> {
visibility: repositoryVisibilityEnum.parse(repo.visibility),
};
// A `failed` repo whose recorded location still resolves to a
// live same-source mirror (e.g. migrate succeeded but metadata
// failed) must be SYNCED, not re-created — otherwise the
// re-create loop spawns suffixed duplicates (#315). The create
// path also reuses now, but routing to sync here avoids a
// wasted migrate attempt and keeps recovery cheap.
if (repo.status === 'failed' && repository.mirroredLocation) {
const { findExistingMirror } = await import('@/lib/utils/mirror-source-match');
const existing = await findExistingMirror({
repository,
config,
candidateOwner: repository.mirroredLocation.split('/')[0] || '',
candidateName: repository.name,
});
if (existing) {
await syncGiteaRepo({ config, repository });
console.log(`[Scheduler] Re-synced failed repository with live mirror: ${repo.fullName}`);
return;
}
}
await mirrorGithubRepoToGitea({ octokit, repository, config });
console.log(`[Scheduler] Auto-mirrored repository: ${repo.fullName}`);
} catch (error) {
-303
View File
@@ -1,303 +0,0 @@
/**
* Enhanced handler for starred repositories with improved error handling
*/
import type { Config, Repository } from "./db/schema";
import { Octokit } from "@octokit/rest";
import { processWithRetry } from "./utils/concurrency";
import {
getOrCreateGiteaOrgEnhanced,
getGiteaRepoInfo,
handleExistingNonMirrorRepo,
createOrganizationsSequentially
} from "./gitea-enhanced";
import { mirrorGithubRepoToGitea } from "./gitea";
import { getMirrorStrategyConfig } from "./utils/mirror-strategies";
import { createMirrorJob } from "./helpers";
/**
* Process starred repositories with enhanced error handling
*/
export async function processStarredRepositories({
config,
repositories,
octokit,
}: {
config: Config;
repositories: Repository[];
octokit: Octokit;
}): Promise<void> {
if (!config.userId) {
throw new Error("User ID is required");
}
const strategyConfig = getMirrorStrategyConfig();
console.log(`Processing ${repositories.length} starred repositories`);
console.log(`Using strategy config:`, strategyConfig);
// Step 1: Pre-create organizations to avoid race conditions
if (strategyConfig.sequentialOrgCreation) {
await preCreateOrganizations({ config, repositories });
}
// Step 2: Process repositories with enhanced error handling
await processWithRetry(
repositories,
async (repository) => {
try {
await processStarredRepository({
config,
repository,
octokit,
strategyConfig,
});
return repository;
} catch (error) {
console.error(`Failed to process starred repository ${repository.name}:`, error);
throw error;
}
},
{
concurrencyLimit: strategyConfig.repoBatchSize,
maxRetries: 2,
retryDelay: 2000,
onProgress: (completed, total, result) => {
const percentComplete = Math.round((completed / total) * 100);
if (result) {
console.log(
`Processed starred repository "${result.name}" (${completed}/${total}, ${percentComplete}%)`
);
}
},
onRetry: (repo, error, attempt) => {
console.log(
`Retrying starred repository ${repo.name} (attempt ${attempt}): ${error.message}`
);
},
}
);
}
/**
* Pre-create all required organizations sequentially
*/
async function preCreateOrganizations({
config,
repositories,
}: {
config: Config;
repositories: Repository[];
}): Promise<void> {
// Get unique organization names
const orgNames = new Set<string>();
const starredReposMode = config.githubConfig?.starredReposMode || "dedicated-org";
if (starredReposMode === "preserve-owner") {
for (const repo of repositories) {
orgNames.add(repo.organization || repo.owner);
}
} else if (config.githubConfig?.starredReposOrg) {
orgNames.add(config.githubConfig.starredReposOrg);
} else {
orgNames.add("starred");
}
// Add any other organizations based on mirror strategy
for (const repo of repositories) {
if (repo.destinationOrg) {
orgNames.add(repo.destinationOrg);
}
}
console.log(`Pre-creating ${orgNames.size} organizations sequentially`);
// Create organizations sequentially
await createOrganizationsSequentially({
config,
orgNames: Array.from(orgNames),
});
}
/**
* Process a single starred repository with enhanced error handling
*/
async function processStarredRepository({
config,
repository,
octokit,
strategyConfig,
}: {
config: Config;
repository: Repository;
octokit: Octokit;
strategyConfig: ReturnType<typeof getMirrorStrategyConfig>;
}): Promise<void> {
const starredReposMode = config.githubConfig?.starredReposMode || "dedicated-org";
const starredOrg =
starredReposMode === "preserve-owner"
? repository.organization || repository.owner
: config.githubConfig?.starredReposOrg || "starred";
// Check if repository exists in Gitea
const existingRepo = await getGiteaRepoInfo({
config,
owner: starredOrg,
repoName: repository.name,
});
if (existingRepo) {
if (existingRepo.mirror) {
console.log(`Starred repository ${repository.name} already exists as a mirror`);
// Update database status
const { db, repositories: reposTable } = await import("./db");
const { eq } = await import("drizzle-orm");
const { repoStatusEnum } = await import("@/types/Repository");
await db
.update(reposTable)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${starredOrg}/${repository.name}`,
})
.where(eq(reposTable.id, repository.id!));
return;
} else {
// Repository exists but is not a mirror
console.warn(`Starred repository ${repository.name} exists but is not a mirror`);
await handleExistingNonMirrorRepo({
config,
repository,
repoInfo: existingRepo,
strategy: strategyConfig.nonMirrorStrategy,
});
// If we deleted it, continue to create the mirror
if (strategyConfig.nonMirrorStrategy !== "delete") {
return; // Skip if we're not deleting
}
}
}
// Create the mirror
try {
await mirrorGithubRepoToGitea({
octokit,
repository,
config,
});
} catch (error) {
// Enhanced error handling for specific scenarios
if (error instanceof Error) {
const errorMessage = error.message.toLowerCase();
if (errorMessage.includes("already exists")) {
// Handle race condition where repo was created by another process
console.log(`Repository ${repository.name} was created by another process`);
// Check if it's a mirror now
const recheck = await getGiteaRepoInfo({
config,
owner: starredOrg,
repoName: repository.name,
});
if (recheck && recheck.mirror) {
// It's now a mirror, update database
const { db, repositories: reposTable } = await import("./db");
const { eq } = await import("drizzle-orm");
const { repoStatusEnum } = await import("@/types/Repository");
await db
.update(reposTable)
.set({
status: repoStatusEnum.parse("mirrored"),
updatedAt: new Date(),
lastMirrored: new Date(),
errorMessage: null,
mirroredLocation: `${starredOrg}/${repository.name}`,
})
.where(eq(reposTable.id, repository.id!));
return;
}
}
}
throw error;
}
}
/**
* Sync all starred repositories
*/
export async function syncStarredRepositories({
config,
repositories,
}: {
config: Config;
repositories: Repository[];
}): Promise<void> {
const strategyConfig = getMirrorStrategyConfig();
console.log(`Syncing ${repositories.length} starred repositories`);
await processWithRetry(
repositories,
async (repository) => {
try {
// Import syncGiteaRepo
const { syncGiteaRepo } = await import("./gitea");
await syncGiteaRepo({
config,
repository,
});
return repository;
} catch (error) {
if (error instanceof Error && error.message.includes("not a mirror")) {
console.warn(`Repository ${repository.name} is not a mirror, handling...`);
const starredReposMode = config.githubConfig?.starredReposMode || "dedicated-org";
const starredOrg =
starredReposMode === "preserve-owner"
? repository.organization || repository.owner
: config.githubConfig?.starredReposOrg || "starred";
const repoInfo = await getGiteaRepoInfo({
config,
owner: starredOrg,
repoName: repository.name,
});
if (repoInfo) {
await handleExistingNonMirrorRepo({
config,
repository,
repoInfo,
strategy: strategyConfig.nonMirrorStrategy,
});
}
}
throw error;
}
},
{
concurrencyLimit: strategyConfig.repoBatchSize,
maxRetries: 1,
retryDelay: 1000,
onProgress: (completed, total) => {
const percentComplete = Math.round((completed / total) * 100);
console.log(`Sync progress: ${completed}/${total} (${percentComplete}%)`);
},
}
);
}
+420
View File
@@ -0,0 +1,420 @@
import { describe, test, expect } from "bun:test";
import {
normalizeCloneUrl,
cloneUrlsMatch,
isMirrorOfSource,
classifyCandidateName,
findExistingMirror,
} from "./mirror-source-match";
import type { Repository } from "@/lib/db/schema";
import type { Config } from "@/types/config";
// Minimal Repository factory for tests. Only the fields read by the helper
// matter (cloneUrl, mirroredLocation, fullName, name).
function makeRepo(overrides: Partial<Repository> = {}): Repository {
return {
id: "repo-1",
userId: "user-1",
configId: "config-1",
name: "Update",
fullName: "NostalgiaForInfinity/Update",
url: "https://github.com/NostalgiaForInfinity/Update",
cloneUrl: "https://github.com/NostalgiaForInfinity/Update.git",
owner: "NostalgiaForInfinity",
organization: undefined,
mirroredLocation: "",
isPrivate: false,
isForked: false,
forkedFrom: undefined,
hasIssues: false,
isStarred: true,
isArchived: false,
size: 0,
hasLFS: false,
hasSubmodules: false,
language: undefined,
description: undefined,
defaultBranch: "main",
visibility: "public",
status: "imported",
lastMirrored: undefined,
errorMessage: undefined,
createdAt: new Date(),
updatedAt: new Date(),
...overrides,
} as unknown as Repository;
}
const config: Partial<Config> = {
userId: "user-1",
giteaConfig: { url: "https://gitea.example.com", token: "t" } as any,
};
describe("normalizeCloneUrl", () => {
test("strips trailing .git", () => {
expect(normalizeCloneUrl("https://github.com/a/b.git")).toBe(
"https://github.com/a/b"
);
});
test("strips embedded credentials", () => {
expect(normalizeCloneUrl("https://x-access-token:ghp_secret@github.com/a/b.git")).toBe(
"https://github.com/a/b"
);
});
test("strips trailing slash", () => {
expect(normalizeCloneUrl("https://github.com/a/b/")).toBe(
"https://github.com/a/b"
);
});
test("lowercases host (and value)", () => {
expect(normalizeCloneUrl("https://GitHub.com/a/b")).toBe(
"https://github.com/a/b"
);
});
test("returns empty string for blank/invalid input", () => {
expect(normalizeCloneUrl("")).toBe("");
expect(normalizeCloneUrl(null)).toBe("");
expect(normalizeCloneUrl(undefined)).toBe("");
});
test("handles scp-style git URLs via fallback", () => {
expect(normalizeCloneUrl("git@github.com:a/b.git")).toBe("git@github.com:a/b");
});
});
describe("cloneUrlsMatch", () => {
test("https vs token-embedded URL match", () => {
expect(
cloneUrlsMatch(
"https://github.com/a/b.git",
"https://x-access-token:tok@github.com/a/b.git"
)
).toBe(true);
});
test(".git suffix and trailing slash differences match", () => {
expect(
cloneUrlsMatch("https://github.com/a/b", "https://github.com/a/b.git/")
).toBe(true);
});
test("host case-insensitive match", () => {
expect(
cloneUrlsMatch("https://GITHUB.com/a/b", "https://github.com/a/b")
).toBe(true);
});
test("different repos do not match", () => {
expect(
cloneUrlsMatch("https://github.com/a/b", "https://github.com/c/d")
).toBe(false);
});
test("empty/unknown URL never matches", () => {
expect(cloneUrlsMatch("", "https://github.com/a/b")).toBe(false);
expect(cloneUrlsMatch("https://github.com/a/b", undefined)).toBe(false);
});
});
describe("isMirrorOfSource", () => {
test("true when mirror with matching original_url", () => {
expect(
isMirrorOfSource(
{ mirror: true, original_url: "https://github.com/a/b" } as any,
"https://github.com/a/b.git"
)
).toBe(true);
});
test("false when not a mirror", () => {
expect(
isMirrorOfSource(
{ mirror: false, original_url: "https://github.com/a/b" } as any,
"https://github.com/a/b"
)
).toBe(false);
});
test("false when original_url is for a different source (phantom fork)", () => {
expect(
isMirrorOfSource(
{ mirror: true, original_url: "https://github.com/other/repo" } as any,
"https://github.com/a/b"
)
).toBe(false);
});
test("false when original_url missing (cannot confirm)", () => {
expect(
isMirrorOfSource({ mirror: true } as any, "https://github.com/a/b")
).toBe(false);
});
test("false for null repoInfo", () => {
expect(isMirrorOfSource(null, "https://github.com/a/b")).toBe(false);
});
});
describe("findExistingMirror", () => {
test("reuses existing same-source mirror at base candidate name (#315)", async () => {
const repo = makeRepo();
const getRepoInfo = async ({ owner, repoName }: any) => {
if (owner === "starred" && repoName === "Update") {
return {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).not.toBeNull();
expect(match!.owner).toBe("starred");
expect(match!.repoName).toBe("Update");
});
test("reuses via mirroredLocation even when base name differs (strategy change, #309)", async () => {
// Strategy changed; current candidate name would be "Update" under "starred",
// but the historical mirror lives at "myorg/Update-NostalgiaForInfinity".
const repo = makeRepo({
mirroredLocation: "myorg/Update-NostalgiaForInfinity",
});
const getRepoInfo = async ({ owner, repoName }: any) => {
if (owner === "myorg" && repoName === "Update-NostalgiaForInfinity") {
return {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).not.toBeNull();
expect(match!.owner).toBe("myorg");
expect(match!.repoName).toBe("Update-NostalgiaForInfinity");
});
test("returns null on genuine different-source collision (regression guard #95/#236)", async () => {
const repo = makeRepo();
const getRepoInfo = async ({ owner, repoName }: any) => {
if (owner === "starred" && repoName === "Update") {
// Same name, but it mirrors a DIFFERENT source.
return {
mirror: true,
original_url: "https://github.com/someoneelse/Update",
} as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).toBeNull();
});
test("returns null for phantom fork (non-mirror at the name)", async () => {
const repo = makeRepo();
const getRepoInfo = async ({ owner, repoName }: any) => {
if (owner === "starred" && repoName === "Update") {
return { mirror: false, original_url: "" } as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).toBeNull();
});
test("falls back to fresh creation when mirroredLocation is stale (Gitea repo deleted)", async () => {
const repo = makeRepo({ mirroredLocation: "starred/Update" });
// Both the recorded location and the base candidate are gone.
const getRepoInfo = async () => null;
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).toBeNull();
});
test("matches mirror even when original_url is token-embedded / .git-suffixed", async () => {
const repo = makeRepo();
const getRepoInfo = async ({ owner, repoName }: any) => {
if (owner === "starred" && repoName === "Update") {
return {
mirror: true,
original_url:
"https://x-access-token:tok@github.com/NostalgiaForInfinity/Update.git",
} as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).not.toBeNull();
});
test("skips a candidate whose lookup throws and still resolves a later candidate", async () => {
const repo = makeRepo({ mirroredLocation: "myorg/Update" });
const getRepoInfo = async ({ owner }: any) => {
if (owner === "myorg") {
throw new Error("network blip");
}
if (owner === "starred") {
return {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any;
}
return null;
};
const match = await findExistingMirror({
repository: repo,
config,
candidateOwner: "starred",
candidateName: "Update",
getRepoInfo,
});
expect(match).not.toBeNull();
expect(match!.owner).toBe("starred");
});
});
describe("classifyCandidateName — suffix vs reuse decision (#315/#309)", () => {
const SOURCE = "https://github.com/NostalgiaForInfinity/Update.git";
test("free name → available", () => {
expect(
classifyCandidateName({
existsInGitea: false,
claimedByOther: false,
repoInfo: null,
sourceCloneUrl: SOURCE,
})
).toBe("available");
});
test("name occupied by OUR same-source mirror → reusable (no suffix, #315)", () => {
expect(
classifyCandidateName({
existsInGitea: true,
claimedByOther: false,
repoInfo: {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any,
sourceCloneUrl: SOURCE,
})
).toBe("reusable");
});
test("name occupied by a DIFFERENT source → taken (suffix, regression #95/#236)", () => {
expect(
classifyCandidateName({
existsInGitea: true,
claimedByOther: false,
repoInfo: {
mirror: true,
original_url: "https://github.com/someoneelse/Update",
} as any,
sourceCloneUrl: SOURCE,
})
).toBe("taken");
});
test("name occupied by a NON-mirror → taken (phantom-fork guard, #309)", () => {
expect(
classifyCandidateName({
existsInGitea: true,
claimedByOther: false,
repoInfo: { mirror: false, original_url: "" } as any,
sourceCloneUrl: SOURCE,
})
).toBe("taken");
});
test("our same-source mirror but DB-claimed by ANOTHER repo → taken (per-user separation)", () => {
expect(
classifyCandidateName({
existsInGitea: true,
claimedByOther: true,
repoInfo: {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any,
sourceCloneUrl: SOURCE,
})
).toBe("taken");
});
test("free in Gitea but DB-claimed by another concurrent op → taken", () => {
expect(
classifyCandidateName({
existsInGitea: false,
claimedByOther: true,
repoInfo: null,
sourceCloneUrl: SOURCE,
})
).toBe("taken");
});
test("existing mirror but unknown source (no sourceCloneUrl) → taken", () => {
expect(
classifyCandidateName({
existsInGitea: true,
claimedByOther: false,
repoInfo: {
mirror: true,
original_url: "https://github.com/NostalgiaForInfinity/Update",
} as any,
sourceCloneUrl: undefined,
})
).toBe("taken");
});
});
+213
View File
@@ -0,0 +1,213 @@
import type { Config } from "@/types/config";
import type { Repository } from "@/lib/db/schema";
import type { GiteaRepoInfo } from "@/lib/gitea-enhanced";
/**
* Source-identity matching for mirror reuse.
*
* Starred (and other) repos were duplicating on every re-mirror because the
* existence check only asked "does a repo with this name exist?" never
* "is the existing repo a mirror of THIS same GitHub source?". This module
* answers the second question so callers can reuse an existing same-source
* mirror instead of generating a suffixed duplicate. See issues #315 / #309.
*/
/**
* Normalize a git clone URL for source-identity comparison.
* Strips embedded credentials, a trailing ".git", a trailing slash, and
* lowercases the host (hosts are case-insensitive; paths are not). Returns an
* empty string for blank/invalid input so callers can treat it as "unknown".
*/
export function normalizeCloneUrl(rawUrl?: string | null): string {
if (typeof rawUrl !== "string") return "";
let url = rawUrl.trim();
if (!url) return "";
try {
const parsed = new URL(url);
// Drop any embedded credentials (e.g. https://user:token@host/...).
parsed.username = "";
parsed.password = "";
const host = parsed.host.toLowerCase();
// Strip trailing slash(es) first so a ".git/" suffix still normalizes.
const path = parsed.pathname.replace(/\/+$/, "").replace(/\.git$/i, "");
return `${parsed.protocol}//${host}${path}`.toLowerCase();
} catch {
// Fall back to best-effort string normalization for non-standard URLs
// (e.g. scp-style git@host:owner/repo). Strip credentials before "@",
// drop ".git"/trailing slash, and lowercase the whole thing.
url = url.replace(/^([a-z]+:\/\/)[^@/]+@/i, "$1");
url = url.replace(/\/+$/, "").replace(/\.git$/i, "");
return url.toLowerCase();
}
}
/**
* Whether two clone URLs point at the same source repository, ignoring
* credentials, ".git" suffix, trailing slash, and host case.
*/
export function cloneUrlsMatch(a?: string | null, b?: string | null): boolean {
const normA = normalizeCloneUrl(a);
const normB = normalizeCloneUrl(b);
if (!normA || !normB) return false;
return normA === normB;
}
/**
* Whether an existing Gitea repo is a mirror of the given GitHub source.
* Uses Gitea's original_url (the recorded migration source) when present;
* if Gitea didn't expose original_url, we cannot positively confirm the
* source and return false (callers then treat the name as a genuine
* collision rather than risk mapping onto an unrelated repo #309).
*/
export function isMirrorOfSource(
repoInfo: GiteaRepoInfo | null,
sourceCloneUrl?: string | null
): boolean {
if (!repoInfo || !repoInfo.mirror) return false;
return cloneUrlsMatch(repoInfo.original_url, sourceCloneUrl);
}
export type CandidateNameClassification = "available" | "reusable" | "taken";
/**
* Classify a candidate mirror name for the suffix-vs-reuse decision in
* generateUniqueRepoName. Pure (all I/O is pre-resolved by the caller):
* - "available": free in Gitea and not DB-claimed by another repo use it
* - "reusable": occupied in Gitea by a mirror of THIS source, not DB-claimed
* by another repo reuse it (no suffix)
* - "taken": occupied by a different source / non-mirror, or DB-claimed by
* another repo must suffix
*
* A DB claim by a DIFFERENT repo always blocks reuse so two users mirroring the
* same source into a shared org stay separated.
*/
export function classifyCandidateName({
existsInGitea,
claimedByOther,
repoInfo,
sourceCloneUrl,
}: {
existsInGitea: boolean;
claimedByOther: boolean;
repoInfo: GiteaRepoInfo | null;
sourceCloneUrl?: string | null;
}): CandidateNameClassification {
if (existsInGitea) {
if (!claimedByOther && isMirrorOfSource(repoInfo, sourceCloneUrl)) {
return "reusable";
}
return "taken";
}
// Not in Gitea, but possibly claimed in the DB by a concurrent operation.
if (claimedByOther) return "taken";
return "available";
}
export interface ExistingMirrorMatch {
owner: string;
repoName: string;
repoInfo: GiteaRepoInfo;
}
/**
* Resolve an existing same-source mirror for a repository, if one exists.
*
* Resolution order (backward compatible):
* 1. The recorded repository.mirroredLocation if it still resolves to a
* live mirror of THIS source, reuse it even when the base candidate name
* differs from the current naming strategy (handles strategy changes #309).
* 2. The provided candidate owner/name if that resolves to a live mirror of
* THIS source, reuse it (handles the self-collision that drove suffixing #315).
*
* Returns null when no live same-source mirror is found (caller should create
* a fresh mirror, generating a unique name if the candidate name is taken by a
* DIFFERENT source).
*/
export async function findExistingMirror({
repository,
config,
candidateOwner,
candidateName,
getRepoInfo,
}: {
repository: Repository;
config: Partial<Config>;
candidateOwner: string;
candidateName: string;
// Injectable for testing; defaults to the real Gitea lookup.
getRepoInfo?: (args: {
config: Partial<Config>;
owner: string;
repoName: string;
}) => Promise<GiteaRepoInfo | null>;
}): Promise<ExistingMirrorMatch | null> {
const lookup =
getRepoInfo ??
(async (args: {
config: Partial<Config>;
owner: string;
repoName: string;
}) => {
const { getGiteaRepoInfo } = await import("@/lib/gitea-enhanced");
return getGiteaRepoInfo(args);
});
const sourceCloneUrl = repository.cloneUrl;
// Candidate locations to probe, in priority order. Dedupe so we don't issue
// the same HTTP lookup twice when mirroredLocation equals the candidate.
const candidates: Array<{ owner: string; repoName: string }> = [];
const seen = new Set<string>();
const pushCandidate = (owner?: string | null, repoName?: string | null) => {
const o = (owner || "").trim();
const r = (repoName || "").trim();
if (!o || !r) return;
const key = `${o}/${r}`.toLowerCase();
if (seen.has(key)) return;
seen.add(key);
candidates.push({ owner: o, repoName: r });
};
if (repository.mirroredLocation && repository.mirroredLocation.trim()) {
const slashIndex = repository.mirroredLocation.indexOf("/");
if (slashIndex > 0 && slashIndex < repository.mirroredLocation.length - 1) {
pushCandidate(
repository.mirroredLocation.slice(0, slashIndex),
repository.mirroredLocation.slice(slashIndex + 1)
);
}
}
pushCandidate(candidateOwner, candidateName);
for (const candidate of candidates) {
let repoInfo: GiteaRepoInfo | null;
try {
repoInfo = await lookup({
config,
owner: candidate.owner,
repoName: candidate.repoName,
});
} catch (error) {
// A failed lookup (network/auth) should not be mistaken for "no mirror";
// skip this candidate and let the caller fall back to its normal flow.
console.warn(
`[Mirror] Could not look up ${candidate.owner}/${candidate.repoName} while resolving existing mirror for ${repository.fullName}: ${
error instanceof Error ? error.message : String(error)
}`
);
continue;
}
if (isMirrorOfSource(repoInfo, sourceCloneUrl)) {
return {
owner: candidate.owner,
repoName: candidate.repoName,
repoInfo: repoInfo as GiteaRepoInfo,
};
}
}
return null;
}