C1orf52
This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.
C1orf52
Chromosome 1 open reading frame 52, is a protein encoded by the C1orf52 gene. C1orf52 is localized in the nucleus in most tissues.[1]
Gene
C1orf52 is located on the minus strand at 1p22.3.[2] Including introns and exons, the gene is 9,720 base pairs with 3 exons.[3]
Gene Neighborhood
The gene neighborhood of C1orf52 consists of B-cell lymphoma 10 (BCL10), B-cell lymphoma antisense 1 (BCL-AS1), dimethylarginine dimethylaminohydrolase 1 (DDAH1), and synapse defective Rho GTPase homolog 2 (SYDE2).[2] The BCL10 gene encodes the BCL10 scaffolding protein that controls immune and pro-inflamatory pathways by connecting antigen receptor signaling to NF-kB activation in B cells and T cells. DDAH1 regulates intracellular ROS levels and apoptosis sensitivity via a SOD2-dependent pathway. SYDE2 converts Rho-type GTPases into an inactive guanosine diphosphate-bound state.
Transcript
Including untranslated regions, the mRNA is 3254 base pairs long.[4]
Transcript Variants
There is a transcript variant that includes an additional exon.[2] This alternate exon in the coding region in variant 2 results in a frameshift and early stop codon. The C1orf52 protein is not formed by this transcript because the product is significantly truncated and the transcript is a candidate for nonsense-mediated decay.[2]
Exons | 1 | 2 | 3 | 4 | Protein Length (amino acids) |
Transcript Variant 1 | 306 | - | 199 | 2750 | 182 |
Transcript Variant 2 | 306 | 127 | 199 | 2750 | none |
Protein
The C1orf52 protein consists of 182 amino acids with a molecular weight of 20 kDa and an isoelectric point of 5 pI.[3] The protein contains a domain of unknown function (DUF4660), also known as pFAM15559, that is 98 amino acids long.[3] The domain of unknown function is flanked by two disordered regions, which make up the majority of the rest of the protein. C1orf52 enables RNA binding activity. Compared to other proteins, C1orf52 is lysine and histidine deficient as well as glutamine and proline rich.
No protein isoforms of C1orf52 have been reported. [5]
Structure
There is a high amount of disorder in the secondary and tertiary protein structure, with very few predicted alpha helixes or beta sheets.
Gene Level Regulation
Expression Patterns
C1orf52 mRNA is ubiquitiously expressed at high levels in human tissues, with higher abundance in bone marrow, brain regions, and immune organs (thymus and thyroid), with lower expression in digestive organs.
Protein Level Regulation
The C1orf52 protein has higher than average abundance than other proteins in humans.[8] It is predicted to be found in the nucleus of cells in most tissue types. There are predicted to be 6 phosphorylation sites, 2 SUMOylation sites, 4 O-glycosylation sites, 2 acetylation sites, and 1 ubiquintination sites.
add protein diagram here
Homology
Paralogs
No paralogs of C1orf52 have been identified in the human genome.[5]
Orthologs
C1orf52 orthologs are found in all common classes of vertebrates: fish, birds, amphibians, reptiles, and mammals. Orthologs were also found in invertebrates including sponges, marine tunicate, and lanclets. Orthologs were not found in insects, fungi, plants or protists. The C1orf52 gene appears most distantly in sea sponges which diverged from humans approximatly 758 million years ago.[9]
Genus and Species | Common Name | Taxonomic Order | Date of Divergence from Humans (MYA) | Assession Number | Sequence Length | Sequence Identity to Humans | Sequence Similarity to Humans |
---|---|---|---|---|---|---|---|
Homo Sapiens | Human | Primate | 0 | NP_932343.1 | 182 | 100% | 100% |
Mus musculus | House Mouse | Rodentia | 87 | NP_079831.1 | 180 | 85.2% | 89.0% |
Ornithorhynchus anatinus | Platypus | Monotreme | 180 | XP_028917768.1 | 191 | 61.7% | 71.0% |
Harpia harpyja | Harpy Owl | Accipitriformes | 319 | XP_052658103.1 | 183 | 64.6% | 75.1% |
Gallus gallus | Chicken | Galliformes | 319 | NP_001264489.2 | 183 | 63.0% | 71.4% |
Taeniopygia guttata | Zebra finch | Passeriformes | 319 | XP_030134956.3 | 183 | 62.1% | 73.2% |
Gopherus evgoodei | Goode’s thornscrub tortoise | Testudines | 319 | XP_038601107.1 | 187 | 64.7% | 73.3% |
Alligator mississippiensis | Alligator | Crocodilia | 319 | XP_014450079.3 | 187 | 62.6% | 70.5% |
Protobothrops mucrosquamatus | Pit viper | Squamata | 319 | XP_015668904.1 | 187 | 61.5% | 69.7% |
Microcaecilia unicolor | Tiny Cayenne Caecilian | Gymnophiona | 352 | XP_030062820.1 | 184 | 62.2% | 72.0% |
Xenopus laevis | African clawed frog | Anura | 352 | NP_001089243.1 | 171 | 60.9% | 70.8% |
Pleurodeles waltl | Iberian ribbed newt | Urodela | 352 | KAJ1114225.1 | 182 | 57.1% | 67.9% |
Protopterus annectens | West African Lung Fish | Ceratodontiformes | 408 | XP_043941971.1 | 181 | 53.5% | 70.1% |
Polypterus senegalus | Gray bichir | Polypteriformes | 429 | XP_039591352 | 188 | 54.3% | 64.5% |
Danio rerio | Zebrafish | Cypriniformes | 429 | NP_956836.1 | 214 | 45.9% | 58.3% |
Pristis pectinata | Smalltooth Sawfish | Rhinopristiformes | 462 | XP_051869055.1 | 205 | 44.9% | 58.9% |
Lampetra fluviatilis | European river lamprey | Petromyzontiformes | 563 | CAL5931002.1 | 242 | 26.7% | 36.0% |
Branchiostoma floridae | Flordia Lanclet | Amphioxiformes | 581 | XP_035684389.1 | 234 | 24.7% | 37.7% |
Styela clava | Sea squirt | Stolidobranchia | 596 | XP_039271545.1 | 236 | 25.4% | 39.9% |
Geodia barretti | Deep Sea Sponge | Tetractinellida | 758 | CAI8039110.1 | 221 | 27.1% | 38.1% |
Interacting Proteins
High throughput affinity capture-mass spectrometry supports a physical association between MAD1L1 (Mitotic Arrest Deficient 1 Like 1) and C1orf52.
Clinical Significance
Single nucleotide polymorphisms within the second intron of human C1orf52 have been linked to metabolic syndrome, high density lipoprotein cholesterol levels, response to levetiracetam in genetic generalized epilepsy, multiple sclerosis, body mass index, and protein quantitative trait (liver).[10]
References
- ^ "C1orf52 protein expression summary - The Human Protein Atlas". www.proteinatlas.org. Retrieved 2024-09-21.
- ^ a b c d "NCBI (National Center for Biotechnology Information) Gene Entry on C1orf52".
- ^ a b c "C1orf52 Gene - Chromosome 1 Open Reading Frame 52".
- ^ "NCBI (National Center for Biotechnology Information) Nucleotide Entry on C1orf52".
- ^ a b "Protein BLAST: search protein databases using a protein query". blast.ncbi.nlm.nih.gov. Retrieved 2024-09-21.
- ^ "I-TASSER server for protein structure and function prediction". zhanggroup.org. Retrieved 2024-12-04.
- ^ "iCn3D: Web-based 3D Structure Viewer". www.ncbi.nlm.nih.gov. Retrieved 2024-12-04.
- ^ "PaxDb: Protein Abundance Database". pax-db.org. Retrieved 2024-12-04.
- ^ "TimeTree :: The Timescale of Life". timetree.org. Retrieved 2024-12-03.
- ^ "GWAS Catalog". www.ebi.ac.uk. Retrieved 2024-12-03.